{smcl}
{* *! version 3.9 of help file 14Feb2024 for -bivariate.ado- Version 3.9, 14Feb2024}{...}
{vieweralsosee "[R] tabstat" "mansection R tabstat"}{...}
{vieweralsosee "[MV] Linear discriminant analysis" "mansection MV discrimlda"}{...}
{vieweralsosee "[MI] Intro" "mansection MI Introsubstantive"}{...}
{vieweralsosee "" "--"}{...}
{vieweralsosee "correlate" "help correlate"}{...}
{vieweralsosee "casewise deletion" "help correlate##options_pwcorr"}{...}
{vieweralsosee "listwise deletion" "help missing##remarks"}{...}
{vieweralsosee "tabstat" "help tabstat"}{...}
{vieweralsosee "discrim lda" "help discrim_lda"}{...}
{vieweralsosee "Factor variables" "help fvvarlist"}{...}
{vieweralsosee "Factor variable expansion" "help fvrevar"}{...}
{vieweralsosee "Time-series variables" "help tsvarlist"}{...}
{vieweralsosee "estat summarize" "help estat_summarize"}{...}
{vieweralsosee "User written hdfe" "help hdfe"}{...}
{vieweralsosee "User written reghdfe" "help reghdfe"}{...}
{vieweralsosee "User written frmttable" "help frmttable"}{...}
{vieweralsosee "User written outreg" "help outreg"}{...}
{vieweralsosee "User-written sumtable" "help sumtable"}{...}
{vieweralsosee "User-written partchart" "help partchart"}{...}
{vieweralsosee "User-written esttab" "help esttab"}{...}
{viewerjumpto "Syntax" "bivariate##syntax"}{...}
{viewerjumpto "Description" "bivariate##description"}{...}
{viewerjumpto "Discussion and more on options" "bivariate##moreonoptions"}{...}
{viewerjumpto "Remarks" "bivariate##remarks"}{...}
{viewerjumpto "Remarks on fixed-effect options" "bivariate##optfe"}{...}
{viewerjumpto "Examples" "bivariate##examples"}{...}
{viewerjumpto "Sample output" "bivariate##smplout"}{...}
{viewerjumpto "Stored results" "bivariate##results"}{...}
{viewerjumpto "Acknowledgements" "bivariate##thanks"}{...}
{viewerjumpto "Author" "bivariate##author"}{...}
{hline}
help for {hi:bivariate}{right:{hi:Version 3.9, 14Feb2024}}
{hline}
{title:Title}
{hi:bivariate} - Bivariate correlations of a dependent variable with each independent/explanatory variable
{marker syntax}{...}
{title:Syntax}
{p 8 16 2}
{cmd:bivariate} {depvar} {indepvars} {ifin} {weight}
[{opth using(filename)}]
[{cmd:,} {it:options}]
{synoptset 20 tabbed}{...}
{synopthdr}
{synoptline}
{synopt:{opt f:ormat}[{cmd:(%}{it:{help format:fmt}}{cmd:)}]}Display format
for all {cmd:bivariate} tables.
Invoking {opt format} without specifying a specific format selects format {cmd:%9.3f}.
Default format is {cmd:%11.5f} for all tables except that produced by
the {opt inspect} option, which has a default format of {cmd:%8.0f}.{p_end}
{syntab:Tabstat options}
{synopt:{opt tab:stat}[{cmd:(}{it:statlist}{cmd:)}]}Provide univariate summary statistics
on dependent and all independent variables using
{help correlate##options_pwcorr:casewise} deletion.
Optionally choose which {help tabstat} statistics to display by specifying
{it:statlist}.
See {help tabstat##statname:here} for a list of available statistics.{p_end}
{syntab:Inspect options}
{synopt:{opt ins:pect}[{cmd:(}{it:N_list}{cmd:)}]}Produce a table of the counts
of observations that are 0, positive, negative, unique, etc..
If {it:N_list} is omitted, the default is to provide all the counts produced
by Stata's {help inspect} command.
Optionally choose which {help inspect} statistics to display by specifying
{it:N_list}.
See {help inspect##results:here} for a list of available counts.{p_end}
{syntab:Groupstat options}
{synopt:{opt group}{cmd:(}{it:categorical_variable}{cmd:)}}{p_end}
{synopt:}Use Stata's {help discrim_lda} command and its postestimation command {cmd: estat grsummarize}
to construct a table of summary statistics for every value of the specified
{it: categorical_variable}. The bivariate correlation coefficients are calculated for the entire sample,
just as they would be without the {opt group} option.{p_end}
{synopt:{opt groupst:ats(statlist)}}{p_end}
{synopt:}Specify the summary statistics to be computed for each variable and each
value of {it: categorical_variable}. The available options are listed
{help discrim_estat##options_estat_grsummarize:here}.{p_end}
{synopt:{opt nowide}}Specify that the returned matrix of groups statistics be in
{cmd:long} format. The default is {cmd:wide} format. This option applies only
to the display and return of results from the {opt group} option.
{syntab:Covariance and correlate options}
{synopt:{opt cor:relate}}Report a table of correlation coefficients between
every pair of independent variables.{p_end}
{synopt:{opt cov:ariance}}Report a table of covariances between
every pair of independent variables.{p_end}
{syntab:Fixed effect options}
{synopt:{opt dem:ean}}Extract from the dependent variable {depvar} and all the
independent variables {indepvars} the fixed-effects defined by the option
{opt ab:sorb(fe_vars)}. Requires the {opt ab:sorb(fe_vars)} option.
To use the option {opt demean}, the user must have installed {help hdfe}, which is available
{net "describe hdfe, from(http://fmwww.bc.edu/RePEc/bocode/)":here}.{p_end}
{synopt:{opt fer2}}Add a column to the results table to display the R^2 from a
regression of each independent variable on the fixed effect variables {it:fe_vars}
specified by the {opt ab:sorb(fe_vars)} option. Requires the {opt ab:sorb(fe_vars)} option.
To use the option {opt fer2}, the user must have installed {help reghdfe}, which is available
{net "describe reghdfe, from(http://fmwww.bc.edu/RePEc/bocode/r)":here}.{p_end}
{synopt:{opt ab:sorb(fe_vars)}}To be used in combination with either or both the options {opt fer2} or {opt demean},
the option {opt ab:sorb(fe_vars)} is used to identify the categorical variables used as "fixed-effects"
in an estimation with dependent variable {it:depvar} and explanatory variables {it:indepvars}.{p_end}
{synopt:{opt time:it}} reports the time required to invoke either {opt fer2} or {opt dem:ean}.{p_end}
{syntab:Esoteric options}
{synopt:{opt novif}}Suppresses the computation and display
of the "Variance Inflation Factor." Unless suppressed,
the last column of the output presents the {cmd:VIF} score
from Stata's {help estat vif} command.{p_end}
{synopt:{opt unc:entered}}Specifies that the {cmd:VIF} should be
"uncentered." See {help estat vif} for details. This option is
not compatible with the option {opt novif}{p_end}
{synopt:{opt l:ist}}Specifies that all {help fvvarlist:factor-variable} operators
be stripped from variables in {indepvars}.
As a result, {cmd:bivariate} does not expand these variables in the output.
{help tsvarlist:Time-series variables} are not affected by this option.{p_end}
{synopt:{opt obsg:ain}}Requests an additional column of output
which gives the number of additional observations that would be gained
by omitting each variable from {indepvars}. Specifying {opt obsg:ain}
automatically invokes the option {opt l:ist}.{p_end}
{synopt:{opt row:names(eqname|varname)}}{p_end}
{synopt:}For dummy variables
in the independent variables list,
appends the value label of the larger value of the dummy as either
the prefix (eqname) or the suffix (varname) of the row label{p_end}
{synopt:{opt m:atrix(matrixname)}}Specifies the name of the matrix of
bivariate statistics returned as {cmd:r({it:matrixname})}.
Default is to return the matrix in {cmd:r({it:bivariate})}{p_end}
{synopt:{opt rm:coll}}Remove collinear variables from the expanded list of
independent variables. This option is invoked automatically by
the option {opt group(varname)}}.{p_end}
{synopt:{opt esa:mple(newvarname [, replace])}}{p_end}
{synopt:}Generates a new
variable named {it:newvarname} which is an indicator variable
equal to 1.0 for observations in the analysis sample
and zero otherwise. {p_end}
{synopt:{opt dep:var(varname)}}Specifies which of the variables listed after
the {cmd:bivariate} command is the dependent variable. Default is the first
listed variable.{p_end}
{synopt:{opt nodum:mies}}Suppresses the two columns in the results table
which otherwise would display the means
of {depvar} for each of the two values of each dummy variable.{p_end}
{syntab:Reporting options}
{synopt:{opt post}}Post the row vector of bivariate correlation coefficients
and the (diagonal) matrix of the variances of those correlations
to Stata's {help ereturn} space as {cmd:e(b)} and {cmd:e(V)}, so that
the correlation coefficients and their standard errors
can be stored with {help estimates store} and subsequently
displayed by {help estimates table}, etc.{p_end}
{synopt:{opth using(filename)}}Write {cmd:bivariate}'s output tables directly
to the MS Word or TeX file named {it:filename} using {help frmttable}.
To write to an MS Word file, the named {it:filename} must end with the extension ".doc",
not the newer MS Word extension ".docx".
{help frmttable} can be installed from Stata Journal's site
{net "describe sg97_5, from(http://www.stata-journal.com/software/sj12-4)":here}.
By default, if the {opt using(filename)} option is not specified,
{cmd:bivariate} displays a clickable {cmd:frmttable} command after every table
displayed in the results window. See {help frmttable:help for frmttable}
for more information on how to customize the MS Word or TeX output.{p_end}
{synopt:{opt addtable}}The {cmd:bivariate} tables are added to an existing
MS Word or TeX file specified by {opt using(filename)}.{p_end}
{synopt:{opt replace}}The MS Word or TeX file specified by {opt using(filename)}
can overwrite an existing file with the same name.{p_end}
{synopt:{opt addnote(string)}}Add a text string as a note at the bottom of each
table {cmd:bivariate} writes to the MS Word or Tex file.
Requires the user to have installed {help frmttable} and to specify the
{opt using(filename)} option.{p_end}
{synopt:{opt tb:ivariate(string)}}Replace the default title on the bivariate
results table written to disc.
Requires the user to have installed {help frmttable} and to specify the
{opt using(filename)} option.{p_end}
{synopt:{opt tt:abstat(string)}}Replace the default title on the {opt tab:stat}
results table written to disc.
Requires the user to have installed {help frmttable} and to specify the
{opt using(filename)} and the {opt tab:stat} options.{p_end}
{synopt:{opt tco(string)}}Replace the default title on the {opt cor:relate}
or {opt cov:ariance} results table written to disc.
Requires the user to have installed {help frmttable} and to specify the
{opt using(filename)} and either the {opt cor:relate}
or the {opt cov:ariance} option.{p_end}
{synopt:{opt ti:nspect(string)}}Replace the default title on the {opt ins:pect}
results table written to disc.
Requires the user to have installed {help frmttable} and to specify the
{opt using(filename)} and the {opt ins:pect} options.{p_end}
{synopt:{opt tg:rouptab(string)}}Replace the default title on the {opt group}
results table written to disc.
Requires the user to have installed {help frmttable} and to specify the
{opt using(filename)} and the {opt group} options.{p_end}
{synopt:{opt tex}}Write a TeX file instead of MS Word file to disk.
Requires the user to have installed {help frmttable} and to specify the
{opt using(filename)} option.{p_end}
{synoptline}
{p2colreset}{...}
{p 4 6 2}
{indepvars} may include {help fvvarlist:factor} and {help tsvarlist:time-series} variables.
At least one explanatory variable is required.{p_end}
{p 4 6 2}
{help by:by} is allowed, but only the
{help bivariate##results:results}
from the last {cmd:by} group are {help return:returned or posted}.
Alternatively, some of the functionality of {cmd:by} is available with the option
{opt group}{cmd:(}{it:categorical_variable}{cmd:)}.{p_end}
{p 4 6 2}
{help weights} are in beta, to be used with care; see {help weight}.{p_end}
{marker description}{...}
{title:Description}
{pstd}
{cmd:bivariate} produces a table of bivariate Pearson correlation coefficients
between a dependent variable and each of a set of independent variables.
As in the syntax for {help regress} and most other Stata estimation commmands,
the first variable listed after the {cmd:bivariate} commmand is assumed to be the
dependent variable. This default can be overridden using the option
{opt depvar}{cmd:(}{it:varname}{cmd:)} to specify another variable as the dependent variable.
All calculations are performed on the subset of observations not missing for any of the variables
in the analysis (i.e. casewise or {help correlate##options_pwcorr:listwise} deletion).
Also see the discussion of listwise deletion in the help for {help missing##remarks:missing values}.
{pstd}
Unless it is suppressed by the option {opt novif}, the last column of the output presents the
{browse "http://en.wikipedia.org/wiki/Variance_inflation_factor":variance inflation factor}
({cmd:VIF}) for each of the independent or explanatory variables as computed by Stata's {help estat vif} command.
For variables which {help regress} would omit from the right-hand-side variables
due to multicollinearity, the {cmd:VIF} value is undefined because it would be infinitely large.
With the option {opt rmcoll}, {cmd:bivariate} suppresses all information on such
multicollinear variables.
{pstd}
If the {opt t:abstat} or {opt t:abstat(statlist)} option is chosen, {cmd:bivariate} executes Stata's {help tabstat}
command to generate a table of descriptive statistics estimated on all variables in the analysis.
Statistics are estimated over the observations which are non-missing on all of the variables
in the analysis as well as non-missing on the variables specified in the options
{opt group(varname)} and/or {opt ab:sorb(fe_vars)}, if any.
{pstd}
With the {opt group(varname)} option, after presenting the results table on the entire sample,
{cmd:bivariate} groups the observations by categories of the grouping variable
{it:varname} and displays the summary statistics for each category in a separate columen.
{pstd}
{cmd:bivariate} accepts {help fvvarlist:factor variables} within the list of
independent variables, {indepvars}.
Unless the {opt obsg:ain} or {opt l:ist} option is
specified, {cmd:bivariate} expands {help fvvarlist:factor variables} into a set of K dummy variables
so that the correlation of each with the dependent variable is separately revealed.
The {cmd:VIF} column of the table of {cmd:bivariate} results will display a missing value
for the {cmd:VIF} of the dummy variable that Stata would drop in a regression on the entire
set of dummies. To force a different selection of the category to be dropped,
specify the desired base level as explained {help fvvarlist##bases:here}.
{pstd}
{cmd:bivariate} accommodates high-dimensional fixed effcts. See the remarks on
the {help bivariate##optfe:fixed-effect options}.
{marker moreonoptions}{...}
{title:Discussion and further information on selected options}
{pstd}
The typical estimation command estimates coefficients which describe how a set
of explanatory or "independent" variables are associated with one or more dependent variables
wihin the context of a model. Most commands use multiple independent variables
such that the coefficient estimate for a single independent variable is
"controlled" for the possibly "confounding" effects of the other independent variables.
The inferences drawn from an estimated model about the effects of each independent
variable are potentially affected by several key properties of
the independent variables. Among these properties are:{p_end}
{phang}. the simple correlation of each independent variable, individually, with the dependent variable{p_end}
{phang}. the collinearity of each independent variable with all of the other independent variables{p_end}
{phang}. the number of observations which could be gained for the entire model if a given independent variable is either dropped from the model or imputed{p_end}
{phang}. the mean and dispersion and other descriptive statistics of each independent variable for the estimation sample{p_end}
{phang}. the number of zeros, unique values, integers, etc. for each independent variable{p_end}
{phang}. for fixed effect estimation, the proportion of each variable's variation explained by the fixed effects{p_end}
{pstd}
The {help bivariate##examples:examples}
demonstrate some of the capabilities of {cmd:bivariate} described
in the rest of this section.
See {help bivariate##syntax:Syntax} for a complete list of options.{p_end}
{phang}
{opt t:abstat} or {opt t:abstat(statlist)} requests an optional table of univariate descriptive statistics
on the complete list of variables, with {help correlate##options_pwcorr:casewise} deletion. The table is saved
in matrices {cmd:r({it:StatTotal})} and {cmd:r({it:TransposedST})}.
If {it:statlist} is not supplied, this option produces a default {help tabstat}
table containing the statistics: mean, p50, sd, cv, min, max, skewness.
Alternatively the user can specify a list of statistics selected from those
available {help tabstat##statname:here}.
Unless explicityly requested, the number
of observations is excluded from the tabstat output on each variable,
but is instead separately saved in {cmd:r({it:N})}.
{phang}
{opt i:nspect} or {opt i:nspect(N_list)} requests an optional table composed
of the information produced by Stata's {help inspect} command for each
variable in the analysis, with {help correlate##options_pwcorr:casewise} deletion.
After execution of {cmd:bivariate}, the table is returned in matrix {cmd:r({it:inspect})}.
If {it:N_list} is not supplied, this option produces a default 8-column
table containing the statistics: N, N_0, N_pos, N_neg, N_unique, ... etc..
Alternatively the user can specify a list of statistics selected from those
available {help tabstat##statname:here}. Note that N_unique will display
as missing if the number of unique values is greater than 99.{p_end}
{phang}
{opt obsg:ain} requests an additional column of output
which gives the number of additional observations that would be gained
by omitting each variable from {indepvars}.
In order to work properly with {help fvvarlist:factor variables}, this option
automatically invokes {cmd:bivariate}'s {opt l:ist} option.
The variable which most constrains the sample size is returned
in the local macro {cmd:r(maxobsgainvar)} and the number of observations potentialy added
by omitting that variable is returned in {cmd:r(maxobsgain)}.{p_end}
{phang}
{opt fer2} and {opt dem:ean} enable {cmd:bivariate} reveal how fixed-effects
change the bivariate and multivariate properties of the variables.
These two options can be specified individually or together.
Either of these options can be time-consuming on
a large data set or with a large number of independent variables.
The option {opt time:it} reports the time required to invoke these options.
See {help bivariate##optfe:below} for more information.{p_end}
{phang}
{opt row:names(eqname|varname)} For dummy or other {help fvvarlist:factor variables}
in the independent variables list, this option appends the value label of the larger value
of the dummy as either the prefix ({cmd:eqname}) or the suffix ({cmd:varname})
of the row label. Try using {opt row:names(eqname)}. {p_end}
{phang}
{opt format} and {cmd:format(%}{it:{help format:fmt}}{cmd:)} specify how the descriptive
statistics are to be formatted. The default is to use a {cmd:%9.0g} format.
{pmore}
{opt format} without (%{it:{help format:fmt}}{cmd:)} specifies that each variable's statistics be formatted
with the variable's current display format; see {manhelp format D}.
{pmore}
{cmd:format(%}{it:{help format:fmt}}{cmd:)} specifies the format to be used for all
statistics. The maximum width of the specified format should not exceed
nine characters.
{phang}
{opt l:ist} specifies that all factor-variable operators and
time-series operators be removed from {indepvars} and that the descriptive statistics
output be computed on the resulting list of base variables.
This is an option of the Stata command {help fvrevar}.
Use this option with care, since statistics like the mean
or the correlation coefficient which are computed on a categorical
factor variable are likely to be nonsense.
This option is automaticlly invoked by the option {opt obsg:ain}.{p_end}
{phang}
{opth m:atrix(matrixname)} designates a specific name for the saved matrix of bivariate results.
If the option is chosen, the matrix of bivariate results is saved as {cmd:r({it:matrixname})}.
The default name is {cmd:r(bivariate)}.
{phang}
{opt post} By invoking this option, the user can subsequently access the bivariate
correlations and their standard errors with the table-making commands
{help estimates table}, {help esttab}, {help outreg}, {help outreg2}
and other commands that store coefficients and standard errors in Stata's
{help ereturn}. space. The bivariate correlation coefficients between
the dependent variable and each of the independent variables are stored
in the row vector {cmd:e(b)} , while their squared standard errors
are stored as the diagonal elements of the matrix {cmd:e(V)}.
The sample used to generate these estimates is stored in the
function e(sample). See {help bivariate##results:Stored results}. {p_end}
{marker remarks}{...}
{title:Remarks}
{pstd}
Prior to estimating any statistical model involving one or more dependent variables
and a set of explanatory or "independent" variables, the user typically
wants to understand the information content of individual variables
and the potential interactions among them which could cause problems of
multicollinearity and confounding. Common Stata commands for this purpose include
{help codebook}, {help inspect},
{help tabstat}, {help mean}, {help correlation}, {help estat summarize}
and the user-written {help partchart}. This program, {cmd:bivariate} adds to this
collection of commands the ability to make and export a table
containing the simple Pearson correlation coefficient between the dependent variable
and each of the independent variables, all computed on the estimation sample.
{pstd}
The {cmd:bivariate} command is designed to accompany an estimation command
like {help regress} in order
to provide bivariate and (optionally) descriptive statistics
on the same variables over the same sample
as used in the estimation command. Like {help regress}
and most estimation commands,
the {cmd:bivariate} command assumes the first variable is the dependent
variable and the rest are independent variables.
{pstd}
The basic syntax for {cmd: bivariate} looks like the basic syntax for
{help regress}. Analagous to the command... {p_end}
{phang}{cmd:. regress price weight mpg i.rep78 foreign}{p_end}