Do Editors or Articles Drive Collaboration? Multilevel Statistical Network Analysis of Wikipedia Coauthorship Brian Keegan, Darren Gergle, Noshir Contractor School of Communication, Northwestern University 2240 Campus Drive, Evanston, IL 60201 {bkeegan, dgergle, nosh}@northwestern.edu ABSTRACT Previous research makes clear that editors of Wikipedia Prior scholarship on Wikipedia’s collaboration processes articles fulfill distinct and diverse collaboration roles and has examined the properties of either editors or articles, but different types of articles employ different forms of not the interactions between both. We analyze the coordination [1, 2]. However, extant scholarship has not coauthorship network of Wikipedia articles about breaking examined the interaction between these features: how do news demanding intense coordination and compare the editors with particular skills self-organize around articles properties of these articles and the editors who contribute to requiring different forms of collaboration? them to articles about contemporary and historical events. Using p*/ERGM methods to test a multi-level, multi- To answer this question we examine a “boundary theoretical model, we identify how editors’ attributes and condition” for self-organization on Wikipedia. We compare editing patterns interact with articles’ attributes and the coauthorship of Wikipedia articles about current and authorship history. Editors’ attributes like prior experience breaking news events such as commercial airline disasters have a stronger influence on collaboration patterns, but to topically similar articles about historical airline disasters. article attributes also play significant roles. Finally, we Articles about breaking news are coauthored under “high discuss the implications our findings and methods have for tempo” conditions which demand unique forms of understanding the socio-material duality of collective coordination to manage interdependencies. We analyze the intelligence systems beyond Wikipedia. coauthorship networks of high and low tempo articles as well as the attributes of editors who contribute to them. Author Keywords We review existing scholarship on the duality of Wikipedia Wikipedia; collaboration; network analysis; coauthorship; as both user action embodied in artifacts and processes exponential random graph model; ERGM; socio-material which support collaboration. Integrating this review, we develop a multi-theoretical, multi-level model describing ACM Classification Keywords how features of editors, articles, and interactions between H.5.3 Theory and Models, Computer-Supported both influence collaboration structure [3, 4]. Our findings Cooperative Work suggest that while the features of articles and attributes of General Terms editors both influence structure, editors’ experience more Human Factors; Measurement; Theory strongly governs the types of editors they collaborate with and the types of articles they work on. These findings have INTRODUCTION implications for how task demands intersect with user “What makes Wikipedia work?” is a pervasive question in attributes to structure self-organizing collaborations. the literature on computer-supported cooperative work and We also make a methodological contribution by social computing. The motivations of editors, design demonstrating how a class of statistical methods called features of the community, affordances of peer production, p*/exponential random graph models (p*/ERGMs) enable and other latent social processes interact in complex ways multi-level network analysis. We specify statistical at multiple levels to enable and sustain this massive parameters which correspond to processes operating at each collaboratively authored online encyclopedia. of the article and editor levels to disentangle which are more influential on collaboration structure. We discuss the Permission to make digital or hard copies of all or part of this work for implications p*/ERGM methods have for analyzing and personal or classroom use is granted without fee provided that copies are comparing multi-level social interactions in other domains. not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, BACKGROUND or republish, to post on servers or to redistribute to lists, requires prior As is the case with many online communities, the majority specific permission and/or a fee. of contributions to Wikipedia come from a fraction of the CSCW’12, February 11–15, 2012, Seattle, Washington, USA. entire user base [5, 6]. Despite this disparity in effort, Copyright 2012 ACM 978-1-4503-1086-4/12/02...$10.00. online communities like Wikipedia are able to escape traps such as the tragedy of the commons and social loafing structural patterns of an article’s contributions from editors. owing to members’ uses and gratifications [7], diverse We advance CSCW scholarship by using a statistical motivations [8], affordances of peer production [9], and approach that allows us to simultaneously examine how the design features of the community itself [10]. These features of articles, attributes of editors, and these network approaches generally emphasize the agency of individuals structures all influence the organizations of Wikipedia to form and realize their own goals. In the context of collaborations. While processes which occur at a single Wikipedia, editors who are motivated to fulfill particular level of analysis (such as 1 and 2 below) certainly play a roles like copyediting or vandal fighting [1, 11], are role, we expect the interactions between each of (1) an socialized into sub-communities with like-minded editor’s attributes and his or her history of editing other collaborators [12], and recognize or revert the contributions articles, and (2) an article’s features and its history of of other editors [13]. revisions from other editors, will provide a more complete account of self-organization on Wikipedia. Peer-production communities are oriented around producing and maintaining an artifact such as an operating 1. Editor based attributes. For example, do experienced system or encyclopedia article. Although these artifacts are editors contribute to more articles than non- the agglomeration of individual users’ actions, artifacts like experienced editors? Wikipedia articles are embedded within socio-technical 2. Article based features. For example, do breaking news systems which imbue them with innate agency. This articles involve more editors than traditional articles? material agency enables them to operate outside of the 3. Editor-focused interactions with article features. For control of any single person and emerges from the network example, is an experienced editor more likely to of user and system interactions [14]. In the context of a contribute to breaking news articles if they previously Wikipedia article, content added or reverted by other contributed to other breaking news articles than a non- editors, markup language interpreted into style formatting experienced editor? by the MediaWiki software, and page protections enforced 4. Article-focused interactions with editor attributes. For by administrators are examples of articles “acting” on their example, are breaking news articles more likely to “own” outside of any one user’s direct control. attract contributions from experienced editors if other experienced editors have also contributed than non- Coauthorship patterns on Wikipedia thus foreground the breaking news articles? duality of persons and material artifacts [14-16]: collaborations occur around articles exhibiting particular In the following sections we examine the prior literature features but articles also emerge from the contributions of that has typically focused either on article-focused features editors with distinct traits. The material agency of articles or editor-focused attributes and aim to consolidate this work such as topic, quality, age, or number of contributors using a multi-theoretical, multi-level modeling approach influences the types of editors who are capable of making that allows us to describe how both features of articles, further contributions to an article. For example, a featured editors and the interactions between them influence the article about Barack Obama probably will not preserve structure of collaborations on breaking news events about contributions from newly-registered editors identifying with commercial airline disasters. the Tea Party movement. However, the properties of editors Article-focused: Task Coordination and Social Foci also influence the types of articles they choose to edit and The demands of coordinating coauthorship on Wikipedia maintain. The human agency of these actors manifests in articles vary substantially with the age of the article and the attributes such as varying expertise, editing experience, and number of contributors to it [2, 17]. An additional roles. For example, a college freshman who uploads photos dimension is the contemporary salience of an article. Like about soccer players will be unlikely to take up correcting other forms of social media [18], current and breaking news formulae on general relativity. This suggests the features events uniquely motivate editors to contribute to and and attributes of both articles and editors influence the self- collaborate in Wikipedia [19]. However, co-authoring an organization of collaboration on Wikipedia. article about breaking news events like commercial airline While scholars have articulated rationales for how and why disasters involves complex, time-sensitive, and highly collaboration and social action emerge from both internal interdependent tasks. In this section we review how the human agency (what we refer to as editor-focused features of breaking news articles not only influence attributes) as well as external social foci (what we refer to patterns of coauthorship, but how these article-focused as article-focused features), the interaction between these features interact with the attributes of the editors who two approaches has not been studied. We review and contribute to these articles. identify themes from each of these literatures to motivate a Although prior research suggests the compounding statistical analysis which allows us to “decouple” this coordination costs of many editors engaged in duality and model the individual influences as well as interdependent work will inhibit the development of high interactions between editor attributes, article features, quality Wikipedia articles [2, 17], articles about breaking structural pattern of editors’ contributions to articles, and news complicate this assumption. On one hand, Wikipedia articles about breaking news events are often perceived to many articles (discussed in the next section). However, be exemplars of timeliness, breadth, and reliability in the considering the features of the article absent the attributes immediate aftermath of an event like the Virginia Tech of the editors who contribute to it is necessarily incomplete. massacre [20]. On the other hand, the concentration of new For example, the “sink or swim” coordination demands of a editor activity is also densest while the article is less than breaking news article or attempts to limit “dysfunction from 24 hours old and being intensively developed [19]. diversity” [24] may predispose editors to only want to collaborate with other editors who exhibit similar The popularity and quality of these articles in spite of these characteristics or qualifications as themselves on a breaking constraints poses a paradox in which “breaking articles” news article. If other experienced editors are contributing remain high quality in spite of the number of editors this may be social proof about the collaboration and may attempting to make simultaneous contributions with attract other experienced editors. This would manifest as an incomplete information and no centralized coordination. article-focused homophily in which the attributes of the Examining the features of only articles or editors may article result in similar kinds of editors collaborating [25]. present an incomplete picture and the novel coordination processes which enable these articles to be rapidly authored Article attributes like salience will co-vary with the but also high quality likely emerge from the interactions types of editors who collaborate with each other. between features of the article and attributes of its editors. H2 Experienced editors will co-author with other experienced editors on breaking news articles. We argue these breaking news articles belong to a class of high-tempo collaborations characterized by non-routine and Editor-focused: Social Roles and Identity extremely urgent work, abrupt consequences, and intense An alternative rationale why collaborations around breaking attention. Coordination in these volatile environments news articles exhibit different processes of self-organization demands high levels of heedful and interrelated action, revolves around the attributes of the editors who contribute knowledge integration, and information processing [21]. to these articles rather than the features of the articles These “emergent response groups” are unique because themselves. Editor-focused attributes like experience can group members have diverse motivations, mixed potentially explain why some editors contribute to more perspectives, varied resources to contribute, and substantial articles than others, but also how these editor attributes volition to come and go as they please. These factors interact with article features and lead users to fulfill distinct contribute to unstable task definitions and the pursuit of roles contributing to particular types of articles. potentially conflicting goals [22]. Roles in social media manifest as behavioral regularities, Members of these collaborations adapt by re-tailoring and structural position, social action, or self-identification. sharing their particular expertise, emphasizing trust through These roles form complex ecologies which are defined in action rather than credibility through expertise, and relying relation to other roles such as substantive experts, technical on narratives and knowledge artifacts to document actions editors, counter-vandalism, and community building [26]. taken [22]. What emerges from the on-going and repeated Other typologies have identified the placeholders, interactions between both editors and the article as they completers, housekeepers, and shapers who contribute, expand, update, copy edit, and fight vandals is not only the integrate, and shape content on Wikipedia [27]. content of the article but also an artifact narrating prior In general, roles are resources that enable individuals to actions and decisions. adapt to new contexts by creating new structures as well as Thus, an article feature such as being breaking or non- imitating behaviors that were previously successful [28]. breaking is an important variable for modeling the Because dedicated Wikipedia editors have distinct but collaboration patterns of editors. The salience of the topic stable behavioral patterns [6], editors can be classified into and demands of coordinating interrelated tasks makes distinct roles based on the distribution of their activity [1]. breaking news articles foci which actively bring people The patterning of interactions among editors inhabiting together and shapes their collective action much more than particular social roles across different breaking news article articles about historical events which do not demand high- collaborations can potentially explain how breaking news tempo collaborations [23]. Therefore, we expect that articles are co-authored in spite of steep coordination costs. articles about breaking news events may attract more Editors may have a particular interest in contributing to editors than Wikipedia articles about non-breaking news topical areas such as airliner disasters (the topic explored in events. Thus, we expect: this paper). Experienced editors in this domain may have Article attributes like salience will co-vary with deep expertise about the appropriate vocabulary and style H1 number of editors. Breaking news articles will attract for describing an accident or knowledge about relevant more editors than non-breaking articles. citations [12]. Thus, these editors’ may be fulfilling “caretaker” roles in which they edit many articles while less There are latent tendencies for breaking news articles to experienced editors specialize on fewer articles. receive many contributions (discussed above) or experienced editors to simply be prolific or engaged in Article-focused interaction: Editor-focused interaction: Breaking article (square) and… Experienced editor (circle) and… Only Non-Exp. & Only Only Contemp. & Only Contemp. Non-exp. Apprentice Apprentices Historical Historical Non-exp & Apprentice & Only Breaking & Breaking & Only Breaking Experienced Experienced Experienced Historical Contemp. Figure 1: Visualization of p*/ERGM attribute interaction parameters that capture varying attributes of both the editors and the articles in Wikipedia. Dark-red circles are expert editors, medium-red circles are apprentice editors, and light-red circles are non-expert editors. Dark-blue squares are breaking articles, medium-blue squares are contemporaneous articles, and light-blue squares are historical articles. Editor attributes like experience will co-vary with why articles appear in one language but not another [29]. H3 the number of articles edited. Experienced editors We expect that accidents occurring within or near contribute to more articles than other editors. developed countries will receive more attention from editors than accidents occurring in developing countries. Again, considering the attributes of editors separately from Finally, a number of studies have identified that the number the features of the articles they edit presents an incomplete of editors, length of an article, and article quality are all account by divorcing the role editors play from the correlated [30, 31]. We use both the article quality and resources to which they contribute. Experienced editors article length as controls on the number of editors who may self-identify as “breaking news editors” who contributed to the article. preferentially edit these articles out of novelty or immediate gratification of contributing to an in-demand information Editor attributes such as tenure in the community and artifact. Experienced editors preferring to edit only breaking whether or not the editor is registered are also likely to news articles would manifest as a pattern of editor-focused influence the likelihood of editors making contributions to homophily in which the attributes of editors cause them to articles of various types. Editors who started editing earlier contribute to similar kinds of articles. Thus, we expect: in Wikipedia’s history likely have greater familiarity with best practices and may be more involved in editing many Editor attributes like experience will co-vary with articles [12]. We also expect registered editors are both the types of articles they edit. Experienced editors highly motivated and more committed to the community, H4 will be more likely to contribute to similar types of both of which lead them to make more contributions than articles than dissimilar types of articles. non-registered editors [10]. Alternative Explanations for Collaboration Structure OUR APPROACH In addition to the hypothesized explanations, we expect a Establishing which of the collaboration processes is most variety of alternative processes could account for influential requires a model accounting for the simultaneous differences in collaboration structures between articles contribution of each of these processes. However, owing to covering breaking, contemporary and historical events. We the methodological limitations of common types of network control for the influence of these factors by including them analysis, studies often only examine one level of analysis as parameters in the model in addition to our hypothesized which accounts for little of the overall variance in the variables of interest (breaking news article and editor network. Analyzing the effects of network parameters experience). interacting at different levels of analysis allow us an Article features such as the severity of a catastrophe, integrated test of complimentary and competing theories of proximity to developed countries, evaluated quality, and how network structure emerges [3, 4]. article length are also likely to influence collaboration While descriptive approaches and regression analyses serve patterns. For example, in our corpus of breaking news valuable purposes for answering particular research events centering on airline disasters we expect that the questions, these approaches are limited to analyzing the number of fatalities and survivors of an accident will be properties in a “snapshot” of a network. These kinds of strongly correlated with the amount of attention an article analyses cannot explain the endogenous processes of how receives from Wikipedia editors. Wikipedians also exhibit a the network structures itself nor the simultaneous influence “self-focus bias” in which geographic proximity influences of exogenous actor-level attributes on the network structure. Statistical models like p*/ERGMs extend the logic of commercial airline disasters which occurred since January multivariate logistic regression to relational data: the 1990 and through December 2010. presence or absence of a binary tie in a network is an Based on the list of articles identified above, we developed outcome variable predicted by a vector of independent a tool to extract and stitch together XML revision histories variables called parameters. These parameters correspond for each article using Wikipedia’s “Special:Export” to theoretically-motivated structural characteristics(s) we function.2 These data include revision-level data about the believe are more or less likely to occur in the distribution. article name and ID, editor name and ID, timestamp, Visualizations of these parameters are provided in Figure 1. content, and comments. Registered users (n=6,462) Because the likelihood that a network tie is present or identified by names and unregistered users (n=7,830) absent in a network is not independently and identically identified by IP addresses each have unique IDs. 14,292 distributed (IID) from other network ties’ likelihoods, a unique user accounts made 58,500 revisions to this corpus statistical model must account for these dependencies [32]. between September 16, 2001 and May 24, 2011. In a p*/ERGM, parameters allow the models to reflect Each revision’s article and editor ID were recorded as a dependencies on both endogenous tendencies for ties to duple representing an editor’s modification of an article. exist because of other local structures in the network (e.g., Because p*/ERGMs estimate the binary presence or popular articles continue to attract more editor attention) as absence of a link rather than the weight or strength of a well as exogenous attribute covariates (e.g., experienced link, repeated editor-article duples were discarded creating editors prefer to work with other experienced editors). a binary edgelist of 23,903 unique editor-article These methods allow us to test hypotheses about the interactions. The edgelist was imported to the statnet tendency for ties to form as a result of the properties of the sending node, receiving node, as well as the presence and statistical network analysis package in R for analysis using properties of other local ties and nodes. the ergm library [36]. The size of the resulting network required us to analyze the data on high-memory computing The model produces a set of parameter estimates whereby instances on Amazon Web Services’ Elastic Cloud estimates of zero indicate the modeled effect does not alter Compute (AWS EC2) infrastructure. the likelihood from random chance, a positive parameter suggests the effect increases the likelihood of a tie, and a Bipartite Network Modeling negative parameter implies the effect decreases the We conceptualize the Wikipedia revision data as a network likelihood of a tie. Significance is tested using a t-ratio and consisting of a set of actors and a set of relationships among concluded to be significant when the absolute value of this these actors. While traditional network analysis emphasizes ratio exceeds a critical t-value of 1.96. Details about the unipartite or one-mode data where the relationships are specification, estimation, and simulation of p*/ERGMs are between a single type of actor (i.e., people-to-people), two beyond the scope of this paper but can be found in [32-35]. unique sets of actors actually exist in Wikipedia: editors and articles. DATA, VARIABLES, AND METHODS A variety of Wikipedia article genres like natural disasters, Because it is nonsensical for an article to edit another article sporting events, and political scandals exhibit high tempo or a user to edit another user, we structure our interaction features which should require unique forms of coordination. data as a bipartite graph in which the nodes can be We examine articles about commercial airline disasters for partitioned into exactly two mutually exclusive sets of this study because these incidents occur with sufficient actors such that ties only exist between sets and no ties exist regularity to generate a large sample but are also “normal within a set [37]. Thus, a link exists between an editor node, accidents” involving complex technology with prompt and E, and an article node, A, if E made a contribution to A, but serious consequences which make reliably notable events neither E-E nor A-A links can exist. This bipartite structure warranting coverage in Wikipedia. is alternatively referred to as a “two-mode” or “affiliation” network. A bipartite network is a natural approach for Articles for the corpus were drawn from the “List of modeling collaboration because it explicitly models the accidents and incidents involving commercial aircraft.” 1 duality of persons and groups: a link between the social Our sample excludes hijackings and other instances of actors as one level of analysis and the groups to which they terrorism (such as the four flights involved in the belong as another [15]. We employ p*/ERGM parameters September 11 attacks) because these incidents represent an designed specifically for bipartite networks [34, 35]. archetype of catastrophe with distinct attention, salience, and causal attributions compared to “typical” accidents Node Attribute Construction attributable to crew error, mechanical failure, or weather A variety of article and editor variables were extracted from conditions. The resulting corpus includes 249 articles about either the revision histories or article content to provide covariates for control, analysis, and modeling. 1 http://en.wikipedia.org/wiki/List_of_accidents_and_incidents_ 2 involving_commercial_aircraft http://en.wikipedia.org/wiki/Special:Export  Article attributes – The date of the incident and Structural tendencies timestamp of the first edit to the article were recorded. Edges -2.626 (-217.0)*** Based upon the difference between these times and Multi-editor article tendency 0.01321 (2.89)*** Wikipedia’s creation in January 2001, three possible Multi-article editor tendency 0.08042 (36.9)*** types of articles exist: breaking articles about events Article degree distribution (α=2.5) -4.476 (-16.8)*** which are written within 72 hours of the incident (n=93), Editor degree distribution (α=0.25) 4.616 (68.6)*** contemporaneous articles about events which happened Article features while Wikipedia existed but were written well after the Controls incident (n=50), and historical articles about events Fatalities 1.22E-03 (20.9)*** which predated Wikipedia’s existence and are thus Survivors 1.61E-04 (1.40) written well after the event (n=106). Location: OECD 0.145 (24.9)*** Quality – Start  Other article attributes – The word count excluding 0.158 (21.9)*** Quality – C 0.371 (56.1)*** markup syntax is recorded as a continuous variable. The Quality – B 0.366 (93.4)*** modal article quality as evaluated by associated Quality – GA 0.309 (11.2)*** WikiProjects is coded as one of seven categorical Quality – FA 0.127 (1.38) attributes ranging from Stub to Featured Article-class. Word Count 6.51E-05 (8.00)***  Incident attributes – The count of fatalities and Main effects (Hypothesis 1) Temporal Type – Contemporaneous -0.549 (-33.1)*** survivors are recorded for each event as controls for the Temporal Type – Historical -0.519 (-45.3)*** severity and thus salience of a particular incident. A dummy control variable for OECD location coded Editor attributes whether or not the crash occurred within or off the coast Controls of one of the 34 developed member nations of the OECD. Registered 0.700 (86.3)*** Cohort – Middle contributor -0.137 (-6.87)***  Editor attributes – The total number of revisions an Cohort – Late contributor -0.154 (-3.45)*** editor made within the corpus over the 10 year span of Main effects (Hypothesis 3) time was recorded as an edit count. This count was Experience – Apprentice -4.14 (-525.2)*** binned into three categories to capture topical editing Experience – Experienced -2.00 (-66.3)*** experience: 1 to 3 contributions (4 is the mean) was Table 1: “Main effect” p*/ERGM estimates (t-statistics). unexperienced (n=12,148), 4 to 42 contributions (mean Estimates are net of parameters in Table 2. Cells are shaded green for positive & significant estimates and red for negative plus one standard deviation) was an apprentice & significant estimates. (n=1,992), and 43 or more contributions was experienced p = *** < 0.001 < ** < 0.01 < * < 0.05 (n=152). Editors who had registered accounts (n=6,462) were dichotomized from non-registered users (n=7,830). tendency) as well as editors to contribute to multiple articles Finally, editors were partitioned into three classes: early (multi-article editor tendency). The estimates for these contributors who made their first contribution to the parameters are listed in Table 1 as “structural tendencies.” corpus before 2008 (n=5,366), middle contributors who The estimate for the negative edges parameter reflects the made their first contribution to the corpus from 2008 to log-likelihood of a network tie appearing entirely by chance 2010 (n=6,224), and late contributors who made their and serves as the “intercept term” reflecting the density of first contribution to the corpus between 2010 and the the network if no other effects were present. This can also present (n=2,702). be interpreted as the “cost” of creating a tie which other P*/ERGM CAPTURES MULTILEVEL INTERACTIONS structural tendencies, factors, and interactions will need to We specify one large p*/ERGM which includes single level overcome. The positive multi-article and multi-editor main effects and structural tendencies (summarized on structural parameters respectively reflect the latent tendency Table 1) as well as multi-level attribute interaction for articles to accumulate editors and for editors to edit parameters (summarized on Table 2) to test our four many articles. The negative article degree distribution hypotheses. This method reproduces similar “main effects” reflects the tendency for articles to avoid long-tailed degree findings for the control variables as previous studies: the distributions while the positive editor degree distribution likelihood of editors revising an article increases with every captures the tendency for editors to have a very skewed additional fatality associated with the incident, incidents distribution. These findings suggest a tie is most likely to occurring in or near OECD nations, higher quality, longer form between articles with few co-authors and editors who word counts, early contributors, and registered users. have also edited many other articles. We also specify endogenous structural parameters which We find evidence for H1 that breaking news articles are control for the latent tendency for links to be created by more likely to attract editors than contemporaneous or chance (edges), the network-level tendencies for articles or historical articles. Both of these article types are less likely editors to become highly centralized (article and editor to have ties to other editors than breaking news articles. degree distribution), and the local-level tendencies for articles to accumulate multiple editors (multi-editor article “breaking news article” to be edited by actors sharing Article-focused interactions (Hypothesis 2) similar or dissimilar attributes like Breaking Contemp. Historical Only Non-Experienced -0.0109 -0.0101 -0.0086 experienced/apprentice/non-experienced editor. “Editor- (P1) (-2.37)* (-15.5)*** (-1.90) focused interactions” capture the tendency for editors Non-Experienced & -0.0153 -0.00122 -0.01449 possessing an attribute like “experienced” to contribute Apprentice (P2) (-3.33)*** (-2.59)** (-3.02)** articles sharing similar or dissimilar features like Only Apprentices (P3) -0.00317 -0.03674 0.00166 breaking/contemporary/historical article. Figure 1 (-0.68) (-16.9)*** (0.39) Non-Experienced & -0.01588 -0.02601 -0.01590 visualizes six structural signatures for article-focused Experienced (P4) (-3.46)*** (-17.5)*** (-20.7)*** interactions with editor attributes on the left and six Apprentice & 0.00613 0.02869 0.00415 structural signatures for editor-focused interactions with Experienced (P5) (1.10) (28.02)*** (0.64) article features on the right. Only Experienced (P6) -0.03552 0.00578 0.00090 (-19.9)*** (0.85) (0.23) Within “Article-focused interactions”, we observe a general tendency for significantly fewer interactions than would be Editor-focused interactions (Hypothesis 4) Experienced Apprentice Non-Exp. expected by chance between non-experienced editors and Only Breaking (P7) 0.00552 0.11540 -6.027 apprentices (P2) and experienced editors (P4). (1.63) (28.3)*** (-135.9)*** Coauthorship among non-experienced editors (P1) is Breaking & Contemp. -0.06421 0.01308 -5.977 likewise rarer than random chance. Likewise, there appear (P8) (-20.7)*** (2.50)* (-52.7)*** to be strong disincentives for apprentice editors to work Only Contemp.(P9) 0.04338 0.23140 -5.116 (3.98)*** (13.0)*** (-32.7)*** with each other (P3) on articles about contemporaneous Breaking & Historical -0.07691 -0.01577 -6.136 incidents. Despite our expectation that experienced editors (P10) (-25.7)*** (-4.22)*** (-81.7)*** would fulfill crucial roles in high tempo collaborations Contemp. & Historical -0.05648 0.08355 -5.517 around breaking news events by intensively collaborating (P11) (-18.5)*** (25.7)*** (-54.8)*** Only Historical (P12) 0.00056 0.15120 -5.468 together, after controlling for variables such as the severity (0.12) (45.5) (-84.6)*** of the event, experienced editors work together on high Table 2: Editor-article “interaction” p*/ERGM estimates (t- tempo collaborations significantly less often than we would statistics) capturing the tendency for the row title to edit/be expect by chance (P6). Nevertheless, the other findings edited by the column title. Estimates are net of parameters in support H2 that the coordination demands of an article Table 1. Labels in parentheses next to attribute names are influence the tendency of editors with similar or dissimilar references for attribute interaction parameters. levels of experience to work together. p = *** < 0.001 < ** < 0.01 < * < 0.05 Within “Editor-focused interactions”, editors of all levels of However, the p*/ERGM estimates testing H3 invert our experience are unlikely to contribute to both breaking and expectation that experienced editors would be more likely historical articles (P10). The lack of shared coauthorship on to contribute to many articles than non-experienced editors. breaking and historical articles suggests these are very The negative estimates imply apprentice and experienced distinct sub-genres with limited interaction between each editors are much less likely than non-experienced editors to groups’ editors. However, apprentice editors have a edit many articles. Experienced editors (within our corpus) tendency to edit diverse combinations of articles above and make repeated contributions to a few articles (again only beyond the latent tendencies for editors to edit many within our corpus) rather than a few contributions to many articles. This offsets the main effect for apprentice editors articles. This specialization points toward rejecting H3. to be unlikely to contribute to articles in general as well as the lack of a bridging role by experienced editors. Modeling Attribute and Structural Interaction Effects p*/ERGM methods stand out from traditional regression Experienced editors contribute to contemporary articles approaches in their ability to model the interactions (P9) at a rate much greater than chance and make between local editor-article authorship structure, editor contributions to different types of articles (P8, P10, P11) at attributes, and article features with parsimonious and rates much less (respectively) than expected by chance. statistically-valid parameters. We employ bipartite Contrary to our expectations, the effects of experienced p*/ERGM parameters like those visualized in Figure 1 and editors’ sustained contributions to only breaking (P7) or use them to model the complex interactions between article historical articles (P12) are weak and non-significant. features and editor attributes. These feature and attribute Highly experienced editors are instead characterized by estimates are summarized in Table 2. deep and sustained involvement in a few articles rather than stewardship of many articles. Instead, it is the apprentice There are two broad classes of interactions reflecting the editors who appear to play a crucial role not only two possible explanations for editor-focused or article- contributing to many articles but also acting as crucial focused processes to influence the collaboration structure. brokers providing bridges within breaking (P7), We use editor experience and article temporal type as the interacting attributes. “Article-focused interactions” capture contemporary (P9), and historical articles (P12) as well as the tendency for articles possessing a feature such as between these article types (P8, P11). Again, these findings by not only analyzing processes occurring at different levels of analysis but also by modeling the interactions between these levels in addition to controlling for potentially confounding processes. p*/ERGM statistical models allowed us to disambiguate between the article-focused and editor-focused interactions by specifying a model which simultaneously incorporates each of these potentially confounding processes to assess the relative contribution of each to the network structure. This approach revealed new insights regarding how Wikipedia editors and articles self- organize in relation to one another. Prior scholarship has either developed editor-focused accounts examining how editor attributes (e.g., experience) influence collaboration patterns [1, 26] or article-focused accounts of why some articles features (e.g., task Figure 2: Degree distribution with mean values from 10,000 coordination demands) lead to higher quality or more simulated networks based on estimated model (in yellow) contributions [2, 17]. Our analysis is the first to compared against observed values (in blue). simultaneously look at both levels of analysis to better understand the relationship each has on the self- support H4 that an editor’s level of experience will organization of collaborations involving extreme influence the tendency for them to edit similar or dissimilar coordination demands and varied editor experience. Our types of articles. results suggest that editor experience and the features of The magnitude of the coefficients for these editor-focused articles in their contribution history have a stronger main effects and interaction parameters are generally larger influence on the self-organization of the collaboration than than the article-focused parameters. This suggests the article features like coordination demands and the attributes attributes and structural interactions focused on editors play of editors who contribute to these articles. Our approach a stronger role in explaining the presence and absence of provides a more complete account of the processes which links between editors and articles than the features and influence the structure of collaborations on Wikipedia than structures focused on articles. Thus, editor attributes are looking at the structure of network of just editors or articles. more influential on the self-organization of high tempo Applying this approach to the paradox of how breaking Wikipedia collaborations than the features of articles. news articles exhibit high quality despite steep coordination Confirming Goodness-of-fit by Simulation costs and varied editor experience, we unpacked how the The previous steps analyzed local-level processes but are attributes of editors have greater influence over this self- these features sufficient to explain global network organization. These findings validated our hypotheses that properties? We assess the model’s goodness-of-fit by not only are the coordination demands of articles matched simulating other networks based on this model and use the with the number of editors who contribute to them (H1), but resulting distribution of networks to compare the properties that coordination demands of certain article types also lead of these generated networks to the observed network [32]. editors to seek or avoid other types of editors depending on Using the ergm package’s “gof” function, we simulate a the type of editor (H2). sample of 10,000 networks based on the p*/ERGM in However, as measured by both effect size and valence, the Tables 1 and 2 and measure fit using the degree features of an article and its interactions with editor distribution. Figure 2 plots the observed values (in blue attributes play a secondary role in structuring the circles) and distribution of simulated values (in yellow collaboration as compared to the attributes of editors and boxes) for the combined degree distribution for both modes their interactions with article features. Although of the network. We observe a good-fitting model because experienced editors exhibit a tendency toward concentrating the observed distributions are almost completely bounded their work in a few articles (H3), we found evidence that an by the distributions from simulated networks. editor’s level of experience leads them to also work on or DISCUSSION avoid certain articles depending on the type of article (H4). Adopting a socio-material approach which recognizes the While previous work examined how editors’ varying level agency of both articles and editors to influence the self- of expertise influenced how Wikipedia tools were used or organization of collaboration requires analyzing the other users were perceived and rewarded [12, 13], we interactions between both articles and editors. We demonstrate that editors’ patterns of contributions are incorporated editor attributes and article features by mediated through their own intrinsic attributes, the modeling their interactions as a bipartite graph and using coordination demands of an article, the kinds of articles p*/ERGM methods. We expanded on previous approaches they have contributed to in the past, and the types of editors who also contributed to those articles. Unwinding these Limitations and Future Work intricate dependencies is crucial for understanding the p*/ERGM methods are computationally intensive and processes which contribute to the formation, maintenance, become even more so as both the complexity of the model dissolution, and re-emergence of social and technological and the number of nodes in the network increases. Although interactions in Wikipedia and other online communities and new “peta-scale” computational infrastructures may address distributed organizations. these bottlenecks, for the time being extending p*/ERGM methods to very large networks containing tens of millions Implications of nodes like the entire Twitter, Facebook, or Wikipedia Wikipedia’s coverage of breaking news events suggests that graphs is impractical. However, well-motivated boundary peer production systems are capable of operating far from specification and comparative analysis or sampling conditions of stable task demands and community approaches combined with meta-analyses can make large- membership [19]. Our findings suggest that tasks which scale analysis more tractable [39]. demand high tempo knowledge collaboration may benefit more from matching users to tasks based on their own The p*/ERGM we employed assumed the data was cross- experience level and history of contributions to similar sectional and thus omitted potential temporal dependencies tasks in the past rather than assembling a team solely by such as a tendency for an editor to contribute after another optimizing on the demands of the task or the experience of editor contributes. Longitudinal models of network change other members of a team. Recruiting members with diverse and dynamics can also be specified [40]. Although, editors’ backgrounds and interests may increase group productivity social roles play a role in coordinating work, it may also be [24], but our results suggest that special care should be paid the case that articles can fulfill “roles” socializing editors to the particular configurations and combinations of into particular collaboration norms or introducing them to interests rather than dimensionless indices of diversity. effective coordination practices. In light of the influence of these editor-focused attributes, future work should unpack Statistical models allow for a more parsimonious and how an editor’s temporal “trajectory” of contributions theoretically-coupled representation of dense and complex influences the types of roles they fulfill across articles. network structures by capturing the local-level interaction tendencies as well as the emergent macro-level structure We encourage other researchers to adopt p*/ERGM [35]. Moreover, complex dependencies in networks are methods to ask better questions about multi-level and multi- difficult to make sense of with descriptive statistics, theoretical processes which influence communication visualizations may not provide statistically valid inferences, patterns, knowledge sharing, and distributed collaboration and due to the differences in levels of analysis in collective intelligence and other socio-technical systems. parameterizing and controlling for these complex ACKNOWLEDGMENTS dependencies is extremely difficult, if not impossible, with We would like to thank members of the CollabLab and traditional OLS and even hierarchical regression (e.g., SONIC groups for development assistance and feedback. mixed model) techniques. REFERENCES Like regression or other statistical approaches, p*/ERGM 1. Welser, H. T., Cosley, D., Kossinets, G., Lin, A., methods require specifying models with theoretically well- Dokshin, F., Gay, G. and Smith, M. Finding social roles motivated parameters in addition to translating extant in Wikipedia. In Proc. iConference'11, ACM (2011): theoretical constructs into appropriate network parameters. 122-129. Absent a theoretical rationale for model specification and 2. Kittur, A., Lee, B. and Kraut, R. Coordination in appropriate controls, both types of models can recover collective intelligence: the role of team structure and spurious relationships. However, these approaches suggest task interdependence. In Proc. CHI'09, ACM (2009): scholars can pose more meaningful research questions 1495-1504 about multi-level and multi-theoretical processes about self- 3. Contractor, N. S., Wasserman, S. and Faust, K. Testing organization in collective intelligence systems. multitheoretical, multilevel hypotheses about p*/ERGMs also allow comparative network analysis by organizational networks. Academy of Management Rev, examining the similarities of the processes which structure 31, 3 (2006), 681-703. networks of very different size, scale, and context [38]. This 4. Monge, P. R. and Contractor, N. S. Theories of analysis only looked at one particular sub-genre of articles Communication Networks. Oxford U. Press, 2003. about airline crashes, but it would be possible to estimate 5. Kittur, A., Chi, E., Pendleton, B. A., Suh, B. and models for other topics with a breaking news component Mytkowicz, T. Power of the Few vs. Wisdom of the such as earthquakes, hurricanes, or sporting events. Crowd: Wikipedia and the Rise of the Bourgeoisie. In p*/ERGMs for each of these could be estimated and a meta- Proc. CHI'07, ACM (2007). analysis performed to compare the collaboration practices 6. Panciera, K., Halfaker, A. and Terveen, L. Wikipedians across topics or even other collaboration systems. are born, not made: a study of power editors on Wikipedia. In Proc. GROUP'09, ACM (2009): 51-60. 7. Lampe, C., Wash, R., Velasquez, A. and Ozkaya, E. 24. Chen, J., Ren, Y. and Riedl, J. The effects of diversity Motivations to participate in online communities. In on group productivity and member withdrawal in online Proc. CHI’10, ACM (2010): 1927-1936. volunteer groups. In Proc. CHI'10, ACM (2010): 821- 8. Rafaeli, S. and Ariel, Y. “Online motivational factors: 830. Incentives for participation and contribution in 25. McPherson, M., Smith-Lovin, L. and Cook, J. M. Birds Wikipedia.” Psychological Aspects of Cyberspace. of a feather: Homophily in social networks. Annual Cambridge U. Press, 2008. Review of Sociology, 27(2001), 415-444. 9. Benkler, Y. The wealth of networks: How social 26. Gleave, E., Welser, H. T., Lento, T. M. and Smith, M. production transforms markets and freedom. Yale U. A. A conceptual and operational definition of 'social Press, 2006. role' in online community. In Proc. HICSS'09, IEEE 10. Ren, Y., Kraut, R. and Kiesler, S. Applying common (2009): 1-11. identity and bond theory to design of online 27. Yates, D., Wagner, C. and Majchrzak, A. Factors communities. Organization Studies, 28, 3 (2007). affecting shapers of organizational wikis. J. of ASIST, 11. Geiger, R. S. and Ribes, D. The work of sustaining order 61, 3 (2010), 543-554. in Wikipedia: the banning of a vandal. In Proc. 28. Baker, W. E. and Faulkner, R. R. Role as resource in the CSCW'10, ACM (2010): 117-126. Hollywood film industry. American J. of Sociology, 97, 12. Bryant, S. L., Forte, A. and Bruckman, A. Becoming 2 (1991), 279-309. Wikipedian: transformation of participation in a 29. Hecht, B. and Gergle, D. On the localness of user- collaborative online encyclopedia. In Proc. GROUP'05, generated content. In Proc. CSCW'10, ACM (2010): ACM (2005): 1-10. 229-232. 13. Kriplean, T., Beschastnikh, I. and McDonald, D. W. 30. Blumenstock, J. E. Size matters: word count as a Articulations of WikiWork: Uncovering Valued Work measure of quality on Wikipedia. In Proc. WWW'08, in Wikipedia through Barnstars. In Proc. CSCW'08, ACM (2008): 1095-1096. ACM (2008): 47-56. 31. Hu, M., Lim, E. P., Sun, A., Lauw, H. W. and Vuong, B. 14. Contractor, N., Monge, P. and Leonardi, P. M. Q. Measuring article quality in Wikipedia. In Proc. Multidimensional Networks and the Dynamics of CIKM'07, ACM (2007): 243-252. Sociomateriality: Bringing Technology Inside the 32. Robins, G., Pattison, P., Kalish, Y. and Lusher, D. An Network. Int’l Journal of Communication, 5, (2011). introduction to exponential random graph (p*) models 15. Breiger, R. L. The duality of persons and groups. Social for social networks. Social Networks, 29, 2 (2007), 173- Forces, 53, 2 (1974), 181-190. 191. 16. Kane, G. C. and Alavi, M. Casting the net: A 33. Robins, G., Snijders, T., Wang, P., Handcock, M. and multimodal network perspective on user-system Pattison, P. Recent developments in exponential random interactions. Info. Sys. Research, 19, 3 (2008), 253-272. graph (p*) models for social networks. Social Networks, 17. Kittur, A. and Kraut, R. E. Harnessing the wisdom of 29, 2 (2007), 192-215. crowds in wikipedia: quality through coordination. In 34. Faust, K. Centrality in affiliation networks. Social Proc. CSCW'08, ACM (2008): 37-46. Networks, 19, 2 (1997), 157-191. 18. Palen, L. and Vieweg, S. The emergence of online 35. Wang, P., Sharpe, K., Robins, G. L. and Pattison, P. E. widescale interaction in unexpected events: assistance, Exponential random graph (p*) models for affiliation alliance & retreat. In Proc. CSCW'08, ACM (2008): networks. Social Networks, 31, 1 (2009), 12-25. 117-126. 36. Handcock, M., Hunter, D., Butts, C., Goodreau, S. M. 19. Keegan, B., Gergle, D. and Contractor, N. Hot off the and Morris, M. statnet: Software Tools for the Wiki: Dynamics, Practices, and Structures in Representation, Visualization, Analysis and Simulation Wikipedia’s Coverage of the Tohoku Catastrophes. In of Network Data. J. of Stat. Software, 24, 1 (2008). Proc. WikiSym'11, ACM (2011): 105-113. 37. Borgatti, S. P. and Everett, M. G. Network analysis of 2- 20. Cohen, N. The Latest on Virginia Tech, From mode data. Social Networks, 19, 3 (1997), 243-269. Wikipedia. The New York Times (2007). 38. Faust, K. and Skvoretz, J. Comparing networks across 21. Brown, S. and Eisenhardt, K. The art of continuous space and time, size and species. Sociological change: Linking complexity theory and time-paced Methodology, 32, 1 (2002), 267-299. evolution in relentlessly shifting organizations. 39. Handcock, M. S. and Gile, K. J. Modeling social Administrative Science Quarterly, 42, 1 (1997), 1-34. networks from sampled data. Annals of Applied 22. Majchrzak, A., Jarvenpaa, S. and Hollingshead, A. Statistics, 4, 1 (2010), 5-25. Coordinating expertise among emergent groups 40. Snijders, T. A. B., Van de Bunt, G. G. and Steglich, C. responding to disasters. Org. Science, 18, 1 (2007). E. G. Introduction to stochastic actor-based models for 23. Feld, S. L. The focused organization of social ties. network dynamics. Social Networks, 32, 1 (2010), 44- American J. of Sociology, 86, 5 (1981), 1015-1035. 60.