Data Science Research and Research Interests

My data science research covers a breadth of issues related to social phenomena, ranging from the roles actors adopt in networks to the factors influencing death penalty executions. To date, my research has produced over half a million dollars in federal grant funding, a Supreme Court citation, and 15 peer reviewed publications.

My substantive research interests include:

  • Substance use disorder treatment

  • Social interdependencies

  • Peace science

  • Alliance politics

  • Systemic theory


Methodologically, I utilize the tools of:

  • Network analysis

  • Machine learning

  • Time series analysis

  • Survival analysis

The following is a sample (not a census) of some of my work, organized thematically.

 
 

Methodological and Statistical Development

Kapferer Network Role Assignment, Wave 1.  In-Groupers, Purple. Out-Groupers, Orange. Movers, Purple.

Kapferer Network Role Assignment, Wave 1.

In-Groupers, Purple.
Out-Groupers, Orange.
Movers, Purple.

Detecting Heterogeneity and Inferring Latent Roles in Longitudinal Networks

Network analysis has typically examined the formation of whole networks while neglecting variation within or across networks. Actors within networks often adopt particular roles.  While cross-sectional approaches for inferring latent roles exist, there is a paucity of approaches for considering roles in longitudinal networks.  This paper explores the conceptual dynamics of temporally observed roles while deriving and introducing a novel statistical tool, the ego-TERGM, capable of uncovering these latent dynamics.  Estimated through an Expectation-Maximization algorithm, the ego-TERGM is quick and accurate in classifying roles within a broader temporal network.  An application to the Kapferer strike network illustrates the model's utility.

Political Analysis, 2018

Software on CRAN

 
Decomposition of Recommender System Influence.

Decomposition of Recommender System Influence.

Inferring Influence Networks from Longitudinal Bipartite Relational Data

with Frank Marrs, Bailey Fosdick, Skyler Cranmer, and Tobias Bohmelt

Longitudinal bipartite relational data characterize the evolution of relations between pairs of actors, where each actor is one of two distinct types and relations exist only between disparate actor types.  An example of such data is countries’ treaty ratifications: in a given year, a relation is generated between country i and treaty k if i ratifies k.  A common task in examining bipartite data is estimating the influence network between actors of the same type.  That is, if state i ratifies a treaty in a given year, is it likely that state j ratifies the same treaty in a future year?  This task is typically accomplished by projecting the bipartite network to a one-mode network, for which generative models are lacking.  To address this shortcoming, we propose a generative model for (discrete-time, weighted) longitudinal bipartite relational data to infer (weighted, directed) influence networks.  We compare performance of the proposed model to existing models on a simulated dataset and on a dataset of relations between states.  Our approach provides improved interpretability and estimability over existing models while performing at least as well in explaining data variation and predicting out of sample.  As our model may be expressed as a linear regression, is extends readily to unweighted and categorical relational data.  Model parsimony is achieved via regularization or applying reduced rank structure to the estimated influence networks.  Finally, we prove that our model asymptotically captures the desired dependencies, even under misspecification.

Journal of Computational and Graphical Statistics, 2020

Software on CRAN

 
FERGM Improvements over ERGM.

FERGM Improvements over ERGM.

Substantive Implications of Unobserved Heterogeneity: Testing the Frailty Approach to Exponential Random Graph Models

with Jan Box-Steffensmeier, Dino Christenson, and Jason Morgan

Exponential Random Graph Models (ERGMs) are an increasingly common tool for inferential network analysis.  However, a potential problem for these models is the assumption of correct model specification. Through six substantive applications (Mesa High, Florentine Marriage, Military Alliances, Militarized Interstate Disputes, Regional Planning, Brain Complexity), we illustrate how unobserved heterogeneity and confounding leads to degenerate model specifications, inferential errors, and poor model fit.  In addition, we present evidence that a better approach exists in the form of the Frailty Exponential Random Graph Model (FERGM), which extends the ERGM to account for unit or group-level heterogeneity in tie formation. In each case, the ERGM is prone to producing inferential errors and forecasting ties with lower accuracy than the FERGM.  

Social Networks, 2019

Software on CRAN

 

Applications of Role Analysis

Role Assignments from Ego-TERGM.  Aggregators, Red. Balancers, Gold. Reformers, Blue. Consolidators, Off White.

Role Assignments from Ego-TERGM.

Aggregators, Red.
Balancers, Gold.
Reformers, Blue.
Consolidators, Off White.

Measuring and Assessing Latent Variation in Alliance Design and Objectives

 The alliance politics literature foundationally assumes that states, motivated by an external threat, form alliances to ensure survival.  Treating alliance objectives as homogenous assumes that alliances' generating process has never changed and that their objectives do not vary, yielding underdeveloped theories and potentially problematic inferences. These studies have been handcuffed by insufficient data on the goals underpinning alliance design, or methods capable of inferring these latent objectives. I propose a novel role-based framework for considering alliance design and objectives which enumerates roles a state can adopt within the alliance network and considers the relationship between a state's role in the alliance network and how they design their local alliance network to accomplish role-based objectives.  To detect this unobserved variation, I employ a novel methodological tool, the ego-TERGM.  Results indicate that the conventional model of alliances is inadequately specified and that scholars should consider the varying alliance network roles states adopt.

 
Roles in the Interest Group Coalition Network, Environmental Cases.

Roles in the Interest Group Coalition Network, Environmental Cases.

Role Analysis Using the Ego-ERGM: A Look at Environmental Interest Group Coalitions

with Jan Box-Steffensmeier, Dino Christenson, and Zack Navabi

Interest groups coordinate to achieve political goals.  However, these groups are heterogeneous, and the division of labor within these coalitions varies.  We explore the presence of distinct roles in coalitions of environmental interest groups, and analyse which factors predict if an organization takes on a particular role.  To model these latent dynamics, we introduce the ego-ERGM.  We find that a group's budget, member size, staff size, and degree centrality are influential in distinguishing between three role assignments.  These results provide insight into the roles adopted in carrying out coalition tasks.  This approach shows promise for understanding a host of networks.

Social Networks, 2018

 

Survival Analysis, Event Dependence, and the Death Penalty

Fig3.png

Event Dependence in U.S. Executions

with Jan Box-Steffensmeier and Frank Baumgartner

Since 1976, the United States has seen over 1,400 judicial executions concentrated in only a few states and counties.  The number of executions across counties appears to fit a stretched distribution.  Our tests estimate the monthly hazard of an execution in a given county, accounting for the number of previous executions, homicides, poverty, population demographics, and state, covering the entire period from 1976 through 2015, across all counties in states with the death penalty. We find that the number of prior executions a county experiences increases the probability of the next execution, in addition to accelerating the rate at which executions occur.  Once a jurisdiction goes down a given path, it tends to accelerate, causing the counties to separate out into those never executing (the vast majority of counties) and those which use the punishment frequently. This finding is of great legal and normative concern, and ultimately, may not be consistent with the equal protection clause of the U.S. Constitution.

PLOS ONE, 2018

Covered in Multiple Media Outlets

 
Geographic Distribution of Death Sentences.

Geographic Distribution of Death Sentences.

Learning to kill: Why a small handful of counties generates the bulk of US death sentences

with Frank Baumgartner, Jan Box-Steffensmeier, Christian Caron, and Hailey Sherman

We demonstrate strong self-referential effects in county-level data concerning use of the death penalty. We first show event-dependency using a repeated-event model. Higher numbers of previous events reduce the expected time delay before the next event. Second, we use a cross-sectional time-series approach to model the number of death sentences imposed in a given county in a given year. This model shows that the cumulative number of death sentences previously imposed in the same county is a strong predictor of the number imposed in a given year. Results raise troubling substantive implications: The number of death sentences in a given county in a given year is better predicted by that county’s previous experience in imposing death than by the number of homicides. This explains the previously observed fact that a large share of death sentences come from a small number of counties and documents the self-referential aspects of use the death penalty. A death sentencing system based on racial dynamics and then amplified by self-referential dynamics is inconsistent with equal protection of the law, but this describes the United States system well.

PLOS ONE, 2020

 

Social Network Treatments for Substance Abuse Disorder

Grad_AffirmPathDist1_08092017.png

Therapeutic community graduates cluster together in social networks: Evidence for spatial selection as a cooperative mechanism in therapeutic communities

with Skyler Cranmer, Carole Harvey, and Keith Warren

It is natural to think of affirmations in TCs as simply a means of positively reinforcing prosocial behavior, a distributed equivalent of the behavioral reward systems that play an important role in many correctional rehabilitation programs. TC clinical theory and research suggests that this reductionist assumption is inadequate to understand the role of affirmations in the TC, where they serve as a motivational tool and as a means of counterbalancing the peer corrections that also play a key role in treatment. This analysis adds that affirmations may also play a role in establishing and reinforcing networks of individuals who are cooperating to overcome substance abuse. The reductionist techniques that underlie most studies of mental health and substance abuse treatment are inadequate for analyzing such networks. Systems science techniques such as social network analysis, which allow researchers to consider the interactions between TC residents, are necessary if we are to fully understand the way in which the TC community acts as a method of clinical intervention in these programs.

Addictive Behaviors, 2018

 
jsat.png

Relationship between network clustering in a therapeutic community and reincarceration following discharge

with Skyler Cranmer, Nathan Dugan, Carole Harvey, and Keith Warren

Recent qualitative work on Therapeutic Communities (TCs) suggests that they help residents change by creating an environment that is simultaneously challenging and supportive. There is evidence that social networks that feature numerous closed triads are both more supportive and more likely to influence individual behavior.  This implies that TC residents whose peer social networks include more closed triads should have improved outcomes. The social network in this study consists of the affirmations exchanged between 1312 men who resided at a 90 bed TC in a Midwestern state over a period of eight years and includes a total of 34,667 weighted edges.  The network was analyzed using the Temporal Network Autocorrelation Model (TNAM) based semiparametric Cox model, thereby using a statistical methodology that accounts for dependence between individuals in the network.  Participants whose social networks of TC peers included a higher percentage of closed triads were at a decreased hazard of reincarceration following termination when controlling for age, length of stay and the number of peers who eventually graduated who affirmed the residents.  These results support the longstanding TC contention that the community as a whole is the method of clinical treatment.  Further quantitative research into TC processes and outcomes should ideally include social network surveys and statistics in order to avoid biases associated with violations of statistical independence assumptions.

Journal of Substance Abuse Treatment, 2018 

 
S03768716.gif

Building the community: Endogenous network formation, homophily and prosocial sorting among therapeutic community residents

with Keith Warren, Skyler Cranmer, George De Leon, Nathan Doogan, Mackenzie Weiler, and Fiona Doherty

BACKGROUND: Researchers have begun to consider the ways in which social networks influence therapeutic community (TC) treatment outcomes. However, there are few studies of the way in which the social networks of TC residents develop over the course of treatment.

METHODOLOGY: We used a Temporal Exponential Random Graph Model (TERGM) to analyze changes in social networks totaling 320,387 peer affirmations exchanged between residents in three correctional TCs, one of which serves men and two of which serve both men and women. The networks were analyzed within weekly and monthly time-frames.

RESULTS: Within a weekly time-frame residents tended to close triads. Residents who were not previously connected tended not to affirm the same peers. Residents showed homophily by entry cohort. Other results were inconsistent across TC units. Within a monthly time-frame participants showed homophily by graduation status. They showed the same patterns of triadic closure when connected, tendency not to affirm the same peers when not connected and homophily by cohort entry time as in a weekly time frame.

CONCLUSIONS: TCs leverage three human tendencies to bring about change. The first is the tendency of cooperators to work together, in this case in seeking graduation. The second is the tendency of people to build clusters. The third is homophily, in this case by cohort entry time. Consistent with TC clinical theory, residents spread affirmations to a variety of peers when they have no previous connection. This suggests that residents balance network clustering with a concern for the community as a whole.

Drug and Alcohol Dependence, 2020 

 
Predicted Probability Curves for Clustering Coefficient

Predicted Probability Curves for Clustering Coefficient

Tightly Bound: The Relationship of Network Clustering Coefficients and Reincarceration at Three Therapeutic Communities

with Keith Warren and Skyler Cranmer

OBJECTIVE: Clustering, the tendency of individuals to form closed triads, is ubiquitous in human social networks. Previous research has found that therapeutic community (TC) residents whose social networks include a high degree of clustering are less likely to be reincarcerated following discharge. In this study, we test this finding in a larger number of TCs.

METHOD: We use a temporal network autocorrelation model (TNAM) to analyze clustering in social networks of affirmations exchanged between TC residents as a predictor of the hazard of reincarceration. The networks were drawn from three corrections-based TCs, two of which include both men’s and women’s units and one of which housed only men.

RESULTS: The findings were inconsistent across facilities. Increased clustering correlates with a reduced hazard of re- incarceration for women at both facilities (β = -3.274, 95% CI [-4.299, -2.238]; β = -18.233, 95% CI [-32.370, -4.095]) and for men at two of the facilities (β =-0.910, 95% CI [-1.213, -0.606]; β = -1.393, 95% CI [-1.825, -0.961]). However, clustering increased the hazard of reincarceration for men at one facility (β = 5.558, 95% CI [4.124, 6.993]).

Conclusions: These results support the idea that the likelihood of reincarceration following discharge from a TC is predicted by clustering, a network structure that occurs at a system level between the individual resident and the entire community. Inconsistency in the direction of the relationship suggests that future research should analyze predictors of prosocial clustering in TCs.

Journal of Studies on Alcohol and Drugs, 2020

 

Examining Social Interdependencies

Network of Correlates of War Fatal Militarized Interstate Disputes, 1938.

Network of Correlates of War Fatal Militarized Interstate Disputes, 1938.

Triangulating War: Network topology and the democratic peace

with Skyler Cranmer and Bruce Desmarais

The principal finding of Peace Science, the democratic peace, rests upon a dyadic understanding of conflict; decades of research has found that jointly democratic dyads do not engage in conflict with law-like regularity.  However, such conflicts may rarely reflect purely dyadic phenomena -- even if a conflict is not multi-party, multiple states may be engaged in distinct disputes with the same common enemy.  We postulate a network theory of conflict which indicates that the democratic peace is a function of the competing interests of mixed-regime dyads and the strategic inefficiencies associated with fighting enemies of enemies.  We find strong evidence of interdependence in the conflict network, that a state's decision to attack another is conditioned upon a third state with that same target.  When accounting for this effect, evidence of the democratic peace is much less clear.

 
Network of Positive State Influence Relationships

Network of Positive State Influence Relationships

Latent influence networks in global environmental politics

with Tobias Böhmelt, Skyler Cranmer, Bailey Fosdick, and Frank Marrs

Leaders' decisions in international politics are not only driven by domestic, but also international factors.  However, while policymakers and diplomats have known for a long time that countries influence each other in international decision-making, systematically assessing and estimating this influence has proven to be more difficult.  Previous research has not demonstrated that a latent impact affecting all states persists in the network of treaties and countries, only that abstractly defined policy decisions in one context may influence what another country does.  Here we present findings that allow us, for the first time, to examine the latent influence network that underlies global politics and its impact in one of the most important policy issues of our time: environmental protection.  In light of diplomats' intuition and anecdotal evidence, we develop the argument for a latent influence network and demonstrate how it operates in the context of international environmental agreements.  To test the hypothesis that ratification is affected by a latent network linking treaties and countries, we analyze newly compiled data with an innovative estimator and show that strong positive and negative influences among countries and treaties do exist.  The analysis also examines the predictive power of the model, highlighting that we significantly improve upon earlier work that has failed to incorporate the network effect.  Our findings constitute the first systematic demonstration of an underlying latent influence network in international politics, different from previously identified domestic factors or diffusion effects.

PLOS ONE, 2019

 
Network of Military Alliances, 19

Network of Military Alliances, 19

The Contagion of Democracy Through International Networks

with Skyler Cranmer and Bruce Desmarais

Work on democratization typically considers the diffusion of democracy through interstate partnerships. However, such partnerships constitute complex networks that scholars have yet to fully explore as vectors for the spread of democracy. We develop a network theory of democratization which characterizes these networks as epistemic communities that influence elites’ attitudes towards favorable regime types. Our theory predicts, and our empirical strategy confirms, that direct and indirect ties in the alliance network are vectors for democratization. In contrast to conventional wisdom, we find that direct influence is only transmitted through the defensive alliance network and find evidence of higher-order effects.

Social Networks, 2019

 
FuzzyBoth2012_NoL-1.png

I Get By With a Little Help from My Friends: Leveraging Campaign Resources to Maximize Congressional Power

with Jan Box-Steffensmeier, Andrew Podob, and Seth Walker

Despite the breadth of scholarship on legislative collaboration, scholars have little empirical understanding how members of Congress collaborate electorally. We posit there are four main reasons members collaborate electorally, namely it compliments and indirectly benefits their legislative agendas. Using a quasi-experiment of the 2016 election and nearly 3.2 million FEC records from 2010-2016, we build upon earlier scholarship, empirically separating legislative and campaign behaviors, showing the similarity and differences in these processes. Specifically, we find that party and committee membership drive most campaign collaboration, though some is cross-party. This finding shines light on how campaign behavior can be entirely isolated from legislative behavior, yet is endeavored to obtain similar ends. While collaboration between members is routinely used to achieve policy goals, we show electoral collaboration mimics legislative collaboration.

American Journal of Political Science, 2020

 

Supervised Learning and Causal Inference

PRCurvesComparison-1.png

Alliances and Conflict, or Conflict and Alliances? Appraising the Causal Effect of Alliances on Conflict

Decades of research has examined the deterrent or provoking effect of defensive alliances.  It is typically believed that if a state has a relevant defensive alliance they are less likely to be the target of a militarized dispute.  When theoretically and empirically modeling this relationship scholars typically assume that alliances are exogenous, that alliances are pseudo-randomly assigned and not influenced by a dyad's baseline probability of experiencing conflict.  This is problematic, while alliances may influence the probability of conflict, the expectation of conflict may influence the probability of alliances.  I synthesize theories of alliance formation and alliance-conflict relationship, innovating an endogenous theory of alliances and conflict.  I empirically evaluate this theory in dyadic and systemic perspective, finding that once endogenizing alliances and accounting for all relevant observed and unobserved confounders, alliances neither deter nor provoke aggression.  This has significant implications for our understanding of interstate conflict and alliance politics.