Friday, August 19, 2022
HomeNatureSocial capital I: measurement and associations with financial mobility

Social capital I: measurement and associations with financial mobility

Pattern building

This part describes the strategies used to generate the information analysed on this paper. A server-side evaluation script was designed to robotically course of the uncooked information, strip the information of private identifiers, and generate mixture outcomes, which we analyzed to provide the conclusions on this paper. The script then promptly deleted the uncooked information generated for this mission (see the Privateness and Ethics part).

We work with privacy-protected information from Fb. Survey information present that greater than 69% of the US grownup inhabitants used Fb in 2019, and about three-quarters of these people did so daily37. The identical survey additionally discovered that Fb utilization charges are comparable throughout revenue teams, schooling ranges and racial teams, in addition to amongst city, rural and suburban residents; they’re decrease amongst older adults and barely increased amongst girls than males.

Ranging from the uncooked Fb information as of 28 Might 2022, our main evaluation pattern was constructed by limiting the information to customers aged between 25 and 44 years who reside in the US, have been lively on the Fb platform not less than as soon as within the earlier 30 days, had not less than 100 US-based Fb buddies and had a ZIP code. Our remaining evaluation pattern consists of 72.2 million Fb customers who represent 84% of the US inhabitants between ages 25 and 44 years (primarily based on a comparability to the 2014–2018 American Neighborhood Survey (ACS)). We concentrate on the 25–44-year age vary as a result of earlier work37 has documented that its Fb utilization fee is above 80%, increased than for different age teams. As well as, the ACS publicly releases demographic information for sure age teams, one among which is ages 25–44 years, which permits us to match our pattern with the total inhabitants in addition to to make use of ACS aggregates to foretell SES (‘Variable definitions’).

We don’t hyperlink any exterior individual-level info to the Fb information. Nonetheless, we use varied publicly obtainable sources of mixture statistics to complement our evaluation, together with information on median incomes by block group from the 2014–2018 ACS, information on financial mobility by Census tract and county from the Alternative Atlas72, and measures of county-level and ZIP-level traits, such because the share of the inhabitants by race and ethnicity and the share of single mother and father, from the ACS and the Census. We describe these information intimately in Supplementary Info A.5.

Variable definitions

We assemble the next units of variables for every individual in our evaluation pattern. We measured these variables on 28 Might 2022.

Friendship hyperlinks

The info include info on all friendship hyperlinks between Fb customers. We focus solely on friendships inside our evaluation pattern; that’s, we exclude friendships with folks aged under 25 years or above 44 years, individuals who stay exterior the US or individuals who don’t fulfill one among our different standards for inclusion within the evaluation pattern.

Fb friendship hyperlinks must be confirmed by each events, and most Fb friendship hyperlinks are between people who’ve interacted in individual85. The Fb friendship community can due to this fact be interpreted as offering information on folks’s real-world buddies and acquaintances fairly than purely on-line connections. As a result of people are likely to have many extra buddies on Fb than they work together with often, we additionally confirm that our outcomes maintain when specializing in a person’s ten closest buddies, the place closeness is measured on the idea of the frequency of public interactions similar to likes, tags, wall posts and feedback.


Following prior work86, we use location information to assemble statistics at varied geographical ranges. Each particular person is assigned a residential ZIP code and county primarily based on info and exercise on Fb, together with town reported on Fb profiles in addition to system and connection info. Formally, we use 2010 Census ZIP code tabulation areas (ZCTAs) to carry out all geographical analyses of ZIP-code-level information. We refer to those ZCTAs as ZIP codes for simplicity. In keeping with the 2014–2018 ACS, there are 219,214 Census block teams, 32,799 ZIP codes and three,220 counties, with common populations of 1,488, 9,948 and 101,332 in every respective geographical designation.

Socioeconomic standing

We assemble a mannequin that generates a composite measure of socioeconomic standing (SES) for working-age adults (people between the ages of 25 and 64 years) that mixes varied traits. We assemble our baseline SES measure in three steps, that are described in higher element in Supplementary Info B.1.

First, for Fb customers who’ve location historical past (LH) settings enabled, we use the ACS to gather the median family revenue of their Census block group. LH is an opt-in setting for Fb accounts that enables the gathering and storage of location indicators supplied by a tool’s working system whereas the app is operating. We observe Census block teams from people within the LH subsample. In contrast, we are able to solely assign ZIP codes to people who wouldn’t have LH enabled. If a person subsequently opts out of LH, their beforehand saved location indicators will not be retained.

Second, we estimate a gradient-boosted regression tree to foretell these median family incomes utilizing variables noticed for all people in our pattern, similar to age, intercourse, language, relationship standing, location info (ZIP code), faculty, donations, cellphone mannequin worth and cell provider, utilization of Fb on the Web (fairly than a cell system), and different variables associated to Fb utilization listed in Supplementary Desk 4. We use this mannequin to generate SES predictions for all people in our pattern.

Lastly, people (together with the LH customers within the coaching pattern) are assigned percentile ranks within the nationwide SES distribution on the idea of their predicted SES relative to others in the identical start cohort.

We don’t use any info from a person’s buddies to foretell their SES, which ensures that errors within the SES predictions will not be correlated throughout buddies, which might bias our estimates of homophily by SES. We additionally don’t use direct info on people’ incomes or wealth, as we don’t observe these variables on the particular person degree in our information. Nonetheless, we present under that our measures of SES are extremely correlated with exterior measures of revenue throughout subgroups.

The algorithm described above is one among many potential methods of mixing a set of underlying proxies for SES right into a single measure. To confirm that our findings will not be delicate to the precise variables or algorithm used to foretell SES, we present that our outcomes are comparable after we use a easy unweighted common of z-scores of the underlying proxies or after we immediately use ZIP code median family incomes for all customers, eschewing the prediction mannequin and different proxies completely (Supplementary Desk 5).

Parental SES

We hyperlink people in our main evaluation pattern to their mother and father (who is probably not within the evaluation pattern themselves) to assemble measures of household SES throughout childhood. To hyperlink people to their mother and father, we use self-reported familial ties, a hash of consumer final names, and public user-generated wall posts and main life occasions (see Supplementary Info A.2 for particulars). We then use the SES of oldsters, constructed utilizing the algorithm described above, to assign parental SES to people. Lastly, we assign people a parental SES rank on the idea of their predicted parental SES, rating people on the idea of parental SES relative to others in the identical start cohort. We’re capable of assign parental SES ranks for 31% of the people in our main evaluation pattern.

Highschool friendships

To determine friendships made in highschool, we first use self-reports to assign people to colleges. For individuals who don’t report a highschool, we use information on their friendship networks to impute these teams (see Supplementary Info A.3 for particulars). For the three.3% of customers who report a number of excessive colleges, we choose the varsity wherein the consumer has the most important variety of buddies. This course of produces info on excessive colleges for 74.9% of people in our evaluation pattern. Lastly, if a person and one among their buddies attended the identical highschool inside three cohorts of one another, we determine them as highschool buddies.


Prolonged Information Desk 4a reveals abstract statistics for our baseline pattern and, for comparability, for these aged between 25 and 44 years within the 2014–2018 ACS. The Fb pattern is much like the total inhabitants by way of age, intercourse and language. In keeping with earlier work87, girls are barely over-represented in our Fb pattern (53.6%) relative to males. The median particular person in our evaluation pattern has 382 in-sample Fb buddies; in complete, there are slightly below 21 billion friendship pairs between people within the pattern.

As a lot of our evaluation depends on variation throughout areas, it is crucial that our pattern has good protection not simply nationally but additionally throughout places. In Supplementary Info A.1, we present that our pattern has excessive protection charges throughout the US, and that protection charges don’t differ systematically throughout places with totally different revenue ranges or demographic traits.

Most of our evaluation attracts on the SES measure constructed as described within the earlier subsection. We consider the accuracy of this SES measure by correlating the share of households with above-median revenue inside every ZIP code from the ACS with the estimated share of Fb customers with above-median SES in our pattern. The population-weighted correlation between our estimates of the share of high-SES people and the ACS estimates on the ZIP-code degree is 0.88. Moreover, there are equally excessive correlations between our estimates of the share of high-SES households and corresponding statistics drawn from exterior publicly obtainable administrative datasets at the highschool and faculty ranges (see the companion paper9 for particulars).

For some components of our evaluation—particularly, for computing measures of EC throughout childhood—we concentrate on the subsample of people whom we are able to hyperlink to folks with an SES prediction and whom we are able to assign to a highschool on the idea of self-reports and network-based imputations. Panel B of Prolonged Information Desk 4 presents abstract statistics for this subsample of 19.4 million customers, or about 27% of the total evaluation pattern. The traits of this subsample are broadly much like these of the total pattern, though customers whom we are able to hyperlink to excessive colleges and oldsters with SES predictions are about 2 years youthful on common than customers within the full pattern, largely as a result of our strategy doesn’t enable us to assign SES predictions for folks older than 65 years. County-level median family incomes differ by $876 between the samples, about 6% of an ordinary deviation.

We additional consider our SES measure and parental linkages by evaluating estimates of intergenerational financial mobility utilizing our SES proxies to publicly obtainable estimates primarily based immediately on family incomes from population-level tax information. There’s a linear relationship between people’ and their mother and father’ SES ranks throughout the distribution of parental SES, with a slope of 0.32 (Prolonged Information Fig. 2) This relationship is much like the estimated slope of 0.34 in inhabitants tax information10, thereby supporting the validity of each our SES imputations and parental linkages.

We conclude that our Fb evaluation samples are consultant of the populations we search to review and that our measures of SES align with exterior information.

Measuring connectedness

Financial connectedness


$${f}_{Q,i}equiv frac{{[{rm{N}}{rm{u}}{rm{m}}{rm{b}}{rm{e}}{rm{r}}{rm{o}}{rm{f}}{rm{f}}{rm{r}}{rm{i}}{rm{e}}{rm{n}}{rm{d}}{rm{s}}{rm{i}}{rm{n}}{rm{S}}{rm{E}}{rm{S}}{rm{q}}{rm{u}}{rm{a}}{rm{n}}{rm{t}}{rm{i}}{rm{l}}{rm{e}}Q]}_{i}}{{rm{T}}{rm{o}}{rm{t}}{rm{a}}{rm{l}},{rm{n}}{rm{u}}{rm{m}}{rm{b}}{rm{e}}{rm{r}},{rm{o}}{rm{f}},{{rm{f}}{rm{r}}{rm{i}}{rm{e}}{rm{n}}{rm{d}}{rm{s}}}_{i}}$$


denote particular person i’s share of buddies from SES quantile Q. To acquire measures of the diploma of homophily that aren’t delicate to the scale of every quantile bin, we normalize fQ,i by the share of people within the pattern who belong to quantile Q, wQ (for instance, wQ = 0.1 for deciles). We then outline individual i’s particular person EC (IEC) to people from quantile Q as

$${{rm{IEC}}}_{Q,i}equiv frac{{f}_{Q,i}}{{w}_{Q}}.$$


We outline the extent of EC in neighborhood (county or ZIP code) c because the imply degree of particular person EC of low-SES (for instance, below-median) members of that neighborhood, as follows:

$${{rm{EC}}}_{c}=frac{{sum }_{iin Lcap c}{{rm{IEC}}}_{i}}{{N}_{Lc}},$$


the place NLc is the variety of low-SES people in neighborhood c. When defining EC in a given neighborhood, we proceed to rank people within the nationwide SES distribution and embody friendships to people residing exterior that neighborhood. Within the presence of homophily, EC ranges from 0 to 1, with a worth of 1 indicating, for instance, that half of below-median-SES people’ buddies have above-median-SES.

We assemble normal errors for EC in every location utilizing a bootstrap resampling methodology that adjusts for correlations in connectedness throughout people arising from having widespread swimming pools of buddies (Supplementary Info B.3). As a result of pattern sizes are giant, virtually not one of the geographical distinction in EC is because of sampling variation. On the county degree, the imply normal error of 0.004 is greater than an order of magnitude smaller than the sign normal deviation of EC throughout counties of 0.18. After we randomly cut up the microdata into two halves and estimate ECs by county in every half, we receive a split-sample correlation (reliability) of 0.999 throughout counties, weighting by the variety of folks in every county with family revenue under the nationwide median. The ZIP-code-level estimates we launch are additionally exact, with a cut up pattern reliability of 0.99 (pooling all ZIP codes in the US) when weighted by below-median-income inhabitants.

Childhood EC

We assemble two measures of childhood EC: one primarily based on hyperlinks between people and their mother and father in our Fb evaluation pattern and one other primarily based on information from Instagram.

To measure childhood EC within the Fb pattern, we prohibit the pattern to people whom we may hyperlink to excessive colleges and their mother and father (about 27% of the total pattern). We assign parental SES ranks (estimated utilizing the machine-learning algorithm described within the ‘Variable definitions’ part) inside this subsample, rating people on the idea of parental SES relative to others in the identical start cohort. We then measure fQ,i because the share of buddies from parental-SES quantile Q throughout the subset of buddies from highschool: buddies who attended the identical highschool and are inside three cohorts of the person (in order that they might have more than likely overlapped in class). Ideally, we’d immediately observe all friendships made throughout childhood. Nonetheless, as a result of the Fb platform was not obtainable when the members of the start cohorts we analyse have been rising up, we use present buddies who attended the identical highschool to determine friendships made in childhood. When calculating childhood EC by location, we assign people to the counties the place their excessive colleges are positioned, fairly than counties the place they at present stay, to map folks to the locations the place they grew up. We don’t produce ZIP-code-level measures of childhood EC as a result of we can not reliably infer people’ childhood ZIP codes from the places of their excessive colleges (as kids from many ZIP codes may attend a given faculty).

To measure childhood EC for customers of Instagram, a broadly used social networking platform owned by Meta, we prohibit the uncooked Instagram information to private customers (not enterprise pages) in the US who had not deactivated their account, been lively on the platform throughout the previous 30 days, and have been predicted to be between 13 and 17 years of age as of 28 Might 2022 (see Supplementary Info A.4 for additional particulars). Subsequent, we assign the people in our pattern to ZIP codes on the idea of their IP tackle and different options. Then, we assign Instagram customers an SES estimate on the idea of two variables: (1) the median family revenue of their residential ZIP code from publicly obtainable information on incomes within the 25–44-year age bin from the 2014–2018 ACS, and (2) the worth of their cellphone. We then assemble a weighted z-score of those two inputs, putting two-thirds of the load on median family revenue and one-third of the load on the worth of the cellphone. The upper weight on ZIP-code-based revenue relative to cellphone worth displays that ZIP codes performed a very giant position within the machine-learning mannequin used to assemble our baseline measures of SES within the Fb information (though utilizing different weights within the building of the z-score produced comparable outcomes). We rank customers nationally on the idea of those weighted z-scores to assign them a SES percentile rank. Customers above the fiftieth percentile are termed excessive SES, whereas these on the fiftieth percentile and under are termed low SES. Lastly, we assemble measures of particular person EC as outlined in equation (2). As a result of ties on Instagram, that are termed ‘follows’, are directional—that’s, one individual can observe one other with out that individual following them—we prohibit our consideration to reciprocal followers to imitate friendships on Fb when measuring connectedness.

Every of the 2 measures of childhood EC has sure benefits and limitations. The Fb parental SES measure has the benefit of capturing the childhood friendships of people in roughly the identical set of cohorts for which we measure financial mobility. Nonetheless, as a result of we’re capable of assemble this measure just for the 27% of people for whom we are able to hyperlink to folks and who report their highschool, these estimates are noisier and doubtlessly much less consultant than our baseline estimates. The Instagram information don’t require parental linkage and seize all buddies, not simply highschool buddies, thereby producing a bigger and extra complete pattern. The limitation of the Instagram EC measure is that it measures EC among the many 2005–2009 start cohorts, fairly than the 1978–1983 cohorts for which we measure financial mobility. Nonetheless, the soundness of each financial mobility72 and EC (Supplementary Fig. 1) inside a location over time mitigates the results of this misalignment.

Measuring cohesiveness

We signify a set of friendships by the matrix A {0, 1}n×n, the place  Aij = 1 denotes the existence of a friendship (edge) between people i and j, and Aij = 0 denotes the absence of a friendship. We concentrate on three measures of the construction of A: clustering and assist ratio, that are measures of native correlation in friendships, and spectral homophily, a measure of total community fragmentation. Different measures of cohesiveness, similar to algebraic connectivity88, are additionally informative, however are troublesome to compute and even approximate for networks of the dimensions we analyse. The three measures of cohesiveness we concentrate on right here have the benefit of being computationally tractable in giant samples.


Earlier work33 has argued that if individual i is buddies with each individuals  j and okay, then having  j and okay be buddies with one another might help them collectively stress and sanction individual i, thereby serving to to implement norms. Motivated by this logic, many research have measured the extent of such ‘community closure’ by the diploma of clustering inside an individual’s community: the frequency with which two buddies of that individual are in flip buddies with one another. Letting Ni(A) denote the set of i’s buddies and di(A) its cardinality (the variety of buddies i has), the clustering of i’s community is outlined as

$${{rm{Clustering}}}_{i}({bf{A}})=sum _{okay,jin {N}_{i}({bf{A}}),,okay < j}frac{{A}_{kj}}{{d}_{i}({bf{A}})({d}_{i}({bf{A}})-1)/2}.$$


We measure clustering in a neighborhood c as the common of equation (4) throughout folks dwelling in that neighborhood as follows:

$${{rm{Clustering}}}_{c}=frac{{sum }_{iin c}{{rm{Clustering}}}_{i}({bf{A}})}{{N}_{c}}.$$


Assist ratio

Letting Ac denote the subset of friendships between people who’re each members of neighborhood c, we measure a neighborhood c’s assist ratio as the general frequency with which pairs of buddies have not less than one good friend in widespread, focusing solely on the folks and friendships inside that neighborhood:

$${{rm{S}}{rm{u}}{rm{p}}{rm{p}}{rm{o}}{rm{r}}{rm{t}}{rm{r}}{rm{a}}{rm{t}}{rm{i}}{rm{o}}}_{c}=frac{|{(ij):i,jin c,{A}_{ij}^{c}=1,{[{({A}^{c})}^{2}]}_{ij} > 0}|}{|{(ij):i,jin c,{A}_{ij}^{c}=1}|}.$$


Spectral homophily

Spectral homophily measures the extent to which a community is fragmented into separate teams, and pertains to the velocity of data aggregation in a community. All kinds of algorithms can detect subcommunities89, and spectral homophily gives a easy measure of how strongly a community splits into such subcommunities. Formally, spectral homophily is the second largest eigenvalue of the degree-normalized (row-stochasticized) adjacency matrix ({{{bf{A}}}^{c}}_{{bf{s}}}in {[0,1]}^{ntimes n}). We measure spectral homophily in every county on the idea of the set of friendships amongst people in our main pattern dwelling in that county. Friendship matrices are too sparse to estimate spectral homophily reliably on the ZIP code degree. Within the uncommon cases when there are absolutely remoted nodes inside a county, we calculate spectral homophily on the most important related element, which often makes up the vast majority of customers dwelling in a county.

Measuring civic engagement

Volunteering fee

We begin with the set of all Fb Teams in the US which can be predicted to be about volunteering or activism primarily based on their titles and wouldn’t have the privateness setting ‘secret’ enabled. To additional enhance this classification, we manually assessment the 50 largest such teams in the US and the most important such group in every state, and take away the very small variety of teams which can be clearly misclassified. We then outline the volunteering fee because the share of Fb customers in an space who’re a member of not less than one volunteering or activism group.

Civic organizations

We begin with the set of all Fb Pages in the US which can be categorized as ‘public good’ pages on the idea of the web page title and web page class. We then take away pages that wouldn’t have an internet site linked, wouldn’t have an outline on their Fb web page or wouldn’t have an tackle listed. We then assign the web page to a ZIP code and county on the idea of its listed tackle, and calculate the density of civic organizations because the variety of such pages per 1,000 Fb customers within the space.


We weight all correlations and regressions by the variety of people with below-national-median parental revenue as calculated utilizing Census information72, except in any other case famous. We cluster normal errors in all county-level regressions by commuting zone and ZIP-code-level regressions by county to regulate for potential spatial autocorrelation in errors, except in any other case famous.

The causal impact estimates used within the ‘Causal results of place versus choice’ part are recognized solely from people who transfer throughout areas and are due to this fact a lot much less exact than the baseline observational estimates of financial mobility utilized in the remainder of the paper, making it mandatory to regulate for attenuation bias in these correlation estimates on account of sampling error. We regulate for attenuation bias by dividing the uncooked correlation between the causal estimates of mobility and EC by the sq. root of the reliability of the causal estimates of mobility, as estimated by Chetty and Hendren76. The causal impact estimates are additionally unavailable on the ZIP-code degree owing to small pattern sizes for ZIP-code-level strikes. For this reason we concentrate on the observational estimates of upward revenue mobility in our baseline evaluation.

Privateness and ethics

This mission focuses on drawing high-level insights about communities and teams of individuals, fairly than people. We used a server-side evaluation script that was designed to robotically course of the uncooked information, strip the information of private identifiers, and generate aggregated outcomes, which we analyzed to provide the conclusions on this paper. The script then promptly deleted the uncooked information generated for this mission. Whereas we used varied publicly obtainable sources of mixture statistics to complement our evaluation, we don’t hyperlink any exterior individual-level info to the Fb information. All inferences made as a part of this analysis have been created and used solely for the aim of this analysis and weren’t utilized by Meta for every other function.

A publicly obtainable dataset, which solely consists of mixture statistics on social capital, is obtainable at We use strategies from the differential privateness literature so as to add noise to those mixture statistics to guard privateness whereas sustaining a excessive degree of statistical reliability; see for additional particulars on these procedures. The mission was accepted below Harvard College IRB 17-1692.

Reporting abstract

Additional info on analysis design is obtainable within the Nature Analysis Reporting Abstract linked to this paper.



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments