Data and Programs

Firm-day attention and sentiment data (disclosure of first principal components, PCs, 2012-2021).
Mendeley link to data. Data are in the Social Signal Index folder.


If you use these data, please cite the following paper, which develops, validate and uses the attention and sentiment measures

The Social Signal. by J. Anthony Cookson, Runjing Lu, Marina Niessner and Will Mullins, Journal of Financial Economics. Vol 158 (August 2024). 103870.

Abstract We examine social media attention and sentiment from three major platforms: Twitter, StockTwits, and Seeking Alpha. We find that, even after controlling for firm disclosures and news, attention is highly correlated across platforms, but sentiment is not: its first principal component explains little more variation than purely idiosyncratic sentiment. Using market events, we attribute differences across platforms to differences in users (e.g., professionals vs. novices) and differences in platform design (e.g., character limits in posts). We also find that sentiment and attention contain different return-relevant information. Sentiment predicts positive next-day returns, but attention predicts negative next-day returns. These results highlight the importance of distinguishing between social media sentiment and attention across different investor social media platforms. In the burgeoning social finance literature, nearly all papers examine single platforms; our paper cautions that attention-related results from these papers are more likely to generalize than results concerning sentiment.

Firm-day investor disagreement data (overall, within investment philosophy and across investment philosophies, 2010-2021). Updated. See paper.

README, Stata format, Text format, RData format


If you use these data, please cite the following papers, which develop, validate and uses the disagreement measures:

Why don't we agree? Evidence from a social network of investors by J. Anthony Cookson and Marina Niessner, Journal of Finance Vol 75, No 1 (February 2020), pp. 173-228.


Investor Disagreement: Daily Measures from Social Media by J. Anthony Cookson and Marina Niessner, Working paper. 2023.

Abstract (Cookson and Niessner 2020). We study sources of investor disagreement using sentiment of investors from a social media investing platform, combined with information on the users' investment approaches (e.g., technical, fundamental). We examine how much of overall disagreement is driven by different information sets versus differential interpretation of information by studying disagreement within and across investment approaches. Overall disagreement is evenly split between both sources of disagreement, but within-group disagreement is more tightly related to trading volume than cross-group disagreement. Although both sources of disagreement are important, our findings suggest that information differences are more important for trading than differences across market approaches.

Abstract (Cookson and Niessner 2023): Disagreement is pervasive in financial markets. This paper highlights the properties of daily disagreement and daily attention measures derived from the investor social network StockTwits. Daily disagreement and trading volume are strongly related to one another, both in the sample used in Cookson and Niessner (2020) and out of sample through 2021. Disagreement among investors using different investment strategies as well as within them each relate to trading volume, but within-strategy disagreement exhibits a stronger relationship. These findings all hold after controlling for attention, which is also positively related to daily trading volume.

Echo chambers data (stock-daily information siloing and self-stamped disagreement measures, January 2013 through June 2020).

Link to zipped folder and README.


If you use these data, please cite the following publication, which develops, validates and uses these measures:

Echo Chambers by J. Anthony Cookson, Joseph E. Engelberg and William Mullins Review of Financial Studies, Vol 36, No 2 (February 2023), pp. 450-500. 


Abstract. We find evidence of selective exposure to confirmatory information among 400,000 users on the investor social network StockTwits. Self-described bulls are 5 times more likely to follow a user with a bullish view of the same stock than self-described bears. Consequently, bulls see 62 more bullish messages and 24 fewer bearish messages than bears over the same 50-day period. These “echo chambers” exist even among professional investors and are strongest for investors who trade on their beliefs. Finally, beliefs formed in echo chambers are associated with lower ex-post returns, more siloing of information and more trading volume. 

Partisan investor beliefs data (daily sentiment time series of Republican investors versus others, January 2017 through June 2020).

Zipped Folder. contains README, .dta format, .csv format and .RData format.


If you use these data, please cite the following publication, which develops, validates and uses these sentiment measures:

Does partisanship shape investor beliefs? Evidence form the COVID-19 pandemic by J. Anthony Cookson, Joseph E. Engelberg and William Mullins Review of Asset Pricing Studies, 2020, 10(4), December 2020, pp. 863-893. 



Abstract. We use party-identifying language—like “liberal media” and “MAGA”—to identify Republican users on the investor social platform StockTwits. Using a difference-in-difference design, we find that partisan Republicans remain relatively unfazed in their beliefs about equities during the COVID-19 pandemic, while other users become considerably more pessimistic. In cross-sectional tests, we find Republicans become relatively more optimistic about stocks that suffered the most during the COVID-19 crisis, but more pessimistic about Chinese stocks. Finally, stocks with the greatest partisan disagreement on StockTwits have significantly more trading in the broader market, explaining 28% of the increase in stock turnover during the pandemic.

Text-Based Innovation Data

Data and README in compressed folder.

Main measure and "negative" text-based innovation, 1990-2010.


Innovation word lists + README in compressed folder.  

This file contains most salient words from our innovation topic, as well as an alternative derived from Princeton's WordNet database.  Disclaimer: If you use these word lists on corpora outside of our sample, you should provide validation that it is useful in your context.

If you use these data, please cite the following publication, which develops, validates and uses the innovation measures:

A text-based analysis of corporate innovation by Gustaf Bellstam, Sanjai Bhagat, and J. Anthony Cookson, Management Science, Vol 67, No 7 (July 2021), pp. 4004-4031. 

Abstract. We develop a new measure of innovation using the text of analyst reports of S&P 500 firms. Our text-based measure gives a useful description of innovation by firms with and without patenting and R&D (research and development). For nonpatenting firms, the measure identifies innovative firms that adopt novel technologies and innovative business practices (e.g., Walmart’s cross-geography logistics). For patenting firms, the text-based measure strongly correlates with valuable patents, which likely capture true innovation. The text-based measure robustly forecasts greater firm performance and growth opportunities for up to four years, and these value implications hold just as strongly for innovative nonpatenting firms.

Native American Reservation-County Crosswalk Files

Data and README (folder with .csv files).


This data source contains geographic crosswalks between reservations, counties, headquarter ZIP codes for Native American reservations that had a Native American population exceeding 250 in 1989.


If you use these data files, please cite the following publication, which collected this information:

Law and finance matter: Lessons from externally-imposed courts by James R. Brown, J. Anthony Cookson and Rawley Z. Heimer. Review of Financial Studies Vol 30, No. 3 (March 2017), pp. 1019-1051.

Abstract. This paper provides novel evidence on the real and financial market effects of legal institutions. Our analysis exploits persistent and externally imposed differences in court enforcement that arose when the U.S. Congress assigned state courts to adjudicate contracts on a subset of Native American reservations. Using area-specific data on small business lending, we find that reservations assigned to state courts, which enforce contracts more predictably than tribal courts, have stronger credit markets. Moreover, the law-driven component of credit market development is associated with significantly higher per capita income, with stronger effects in sectors that depend more on external financing. 

Native American Casinos and Courts Data (Reservation-Level Data)

Data and README (folder with .csv and .RData files).


This data source contains reservation-level information on casino presence, size, whether the reservation is in multiple states, and the development of the tribal court system (as of 1985) for Native American reservations that had a Native American population exceeding 250 in 1989.


If you use these data files, please cite the following publication, which collected this information:

Institutions and casinos: An empirical analysis of the location of Indian casinos by J. Anthony Cookson. Journal of Law and Economics. Vol 53, No. 4 (November 2010), pp. 651-687

Abstract: This paper empirically investigates the institutional determinants of whether a tribal government invests in a casino. I find that the presence of Indian casinos is strongly related to plausibly exogenous variation in reservations’ legal and political institutions. Tribal governments that can negotiate gaming compacts with multiple state governments, because tribal lands span state borders, had more than twice the estimated probability (.77 versus .32) of operating an Indian casino in 1999. Tribal governments of reservations where contracts are adjudicated in state courts, rather than tribal courts, have more than twice the estimated probability (.76 versus .34) of investing in an Indian casino, ceteris paribus. These findings suggest that states’ political pressures and predictable judiciaries affect incentives to invest in casinos. This study contributes, more generally, to the empirical literature on the effects of institutions by providing new evidence that low-cost contracting is important for taking advantage of substantial investment opportunities. 

Two-step assortative matching code (R code that implements our two-step assortative matching estimator to correct for selection bias).

R Code (Zipped folder with Monte Carlo Exercises and README). 

If you use this code, please cite the following publication, which develops and validates the technique with application to reputation in IPO underpricing:

Assortative matching and reputation in the market for first issues by Oktay Akkus, J. Anthony Cookson and Ali Hortacsu, Management Science, 67(4), April 2020, pp. 2049-2074.


Abstract. Using a tractable structural model of the matching equilibrium between underwriters and equity-issuing firms, we study the determinants of value in underwriter–firm relationships. Our estimates imply that high underwriter prestige is associated with 5.3%–14.1% greater equilibrium surplus. According to the structural model, high prestige exhibits a significant certification effect throughout the sample (1985–2010), but there is also a countervailing effect of underwriter prestige that reflects subscriber preferences for more underpricing. Consistent with trading off profits from issuers and subscribers, high-prestige underwriters underprice more in hot markets when rents to catering to subscribers are greatest.