My primary domains of expertise are moral judgments and decisions in organizational settings, and workplace stereotyping and discrimination. I also have a strong interest in methodological innovations in research and in meta-science— in other words, on how we can turn an empirical lens on the research process itself to discover ways to increase the value-add of science for managers, organizations, and society. In recent years I and my colleagues have worked to develop new crowdsourced methodologies and applied them to draw novel insights regarding contemporary morality and stereotyping.
Crowdsourcing data analysis
Our crowdsourcing data analysis approach involves providing the same complex dataset to numerous scientists to independently test the same hypotheses (Silberzahn & Uhlmann, 2015). The first such project recruited a crowd of scientists to collectively test theoretical predictions regarding workplace discrimination. We distributed a large archival dataset on over 300,000 referee decisions to 29 independent teams of analysts to investigate whether players with darker skin tone are more likely to receive red cards during football (soccer) matches (Silberzahn et al., 2018). Although approximately two-thirds of teams observed a significant effect in the expected direction, effect size estimates ranged all the way from a nonsignificant tendency for light skin toned players to receive more red cards to a strong tendency for dark skin toned players to receive more red cards.
In a follow-up project (Schweinsberg et al., 2021), independent analysts used the same dataset to test two hypotheses regarding the effects of scientists’ gender and professional status on verbosity during group meetings. This time, not only the analytic approach but also the operationalizations of key variables were left unconstrained and up to individual analysts. For instance, analysts could choose to operationalize status as job title, institutional ranking, citation counts, or some combination. Researchers reported radically different analyses and dispersed empirical outcomes, in a number of cases obtaining significant effects in opposite directions for the same research question. We further found that decisions about how to operationalize variables explain variability in outcomes above and beyond statistical choices (e.g., covariates).
These results demonstrate that defensible, but subjective researcher analytic can lead to highly variable effect size estimates. We argue that high levels of transparency regarding the contingency of research results on analytic strategies are particularly important for controversial topics with policy implications for organizations and governments, such as disparities based on race and gender.
Crowdsourcing hypothesis tests
Our crowdsourcing hypothesis tests approach (Landy et al., 2020) similarly examines the extent to which research results are influenced by subjective decisions that scientists make as they design studies. In our first such project, fifteen research teams independently designed studies to answer five original research questions related to moral judgments, negotiations, and implicit racial bias. Participants from two separate large samples (total N > 15,000) were then randomly assigned to complete one version of each study. Effect sizes varied dramatically across different sets of materials designed to test the same hypothesis: materials from different teams rendered statistically significant effects in opposite directions for four out of five hypotheses, with the narrowest range in estimates being d = -0.37 to +0.26. Meta-analysis and a Bayesian perspective on the results revealed overall support for two hypotheses, and a lack of support for three hypotheses. Overall, practically none of the variability in effect sizes was attributable to the skill of the research team in designing materials, while considerable variability was attributable to the hypothesis being tested. As with crowdsourcing the analysis of a complex dataset, recruiting a crowd of scientists to independently design experimental paradigms to test the same research hypothesis helps reveal the true consistency of empirical support for a scientific claim.
A creative destruction approach to replication
Another line of research from our group seeks to further increase the already high informational value of replications. Drawing on the concept of a gale of creative destruction in a capitalistic economy, we argue that initiatives to assess the robustness of findings in the organizational literature should aim to simultaneously test competing ideas operating in the same theoretical space (Tierney et al., 2020, 2021). In other words, replication efforts should seek not just to support or question the original findings, but also to replace them with revised, stronger theories with greater explanatory power. Achieving this may require adding new measures, conditions, and subject populations to research designs, in order to carry out conceptual tests of multiple theories in addition to directly replicating the original findings.
One recent project applied this creative destruction approach to replication to research on motivated gender discrimination (Tierney et al., 2020). The original research found that evaluators shift the hiring criteria for the position in favour of male applicants for stereotypically male jobs, an effect especially pronounced among decision makers who view themselves as rational and objective (Uhlmann & Cohen, 2005, 2007). In the replication, we pitted this motivated discrimination account against cognitive accounts of assimilation to stereotype-based assumptions, the possibility that due feminist messaging and genuine ideology shifts evaluators now favour female over male candidates (motivated liberalism account), and the study-savviness account in which participants know what the study is about and exhibit pro-female judgments to avoid appearing sexist. In a pattern almost directly opposite to the original research, the replication found overall favouritism towards female candidates among male evaluators in terms of both hiring criteria and selection decision, biases that were especially strong among individuals who believed themselves to be rational and objective. The study-savviness explanation and motivated ideologies explanations both received empirical support, in that participants who had previously completed similar studies, or strongly rejected sexist beliefs, tended to favour female over male applicants. Increasing the information gain from these new investigations, the novel conditions, measures, and populations thus allowed not only for supporting or not supporting the original theorizing, but also generating positive evidence for alternative theoretical accounts.