(PDF) Multiple Social Network Learning a

Multiple Social Network Learning and Its Application in Volunteerism Tendency Prediction Xuemeng Song, Liqiang Nie, Luming Zhang, Mohammad Akbari, Tat-Seng Chua National University of Singapore {sxmustc, nieliqiang, zglumg}@gmail.com,

[email protected]

ABSTRACT We are living in the era of social networks, where people throughout the world are connected and organized by multiple social networks. The views revealed by diﬀerent social networks may vary according to the diﬀerent services they oﬀer. They are complimentary to each other and comprehensively characterize a speciﬁc user from diﬀerent perspectives. As compared to the scare knowledge conveyed by a single source, appropriate aggregation of multiple social networks oﬀers us a better opportunity for deep user understanding. The challenges, however, co-exist with opportunities. The ﬁrst challenge lies in the existence of block-wise missing data, caused by the fact that some users may be very active in certain social networks while inactive in others. The second challenge is how to collaboratively integrate multiple social networks. Towards this end, we ﬁrst proposed a novel model for data missing completion by seamlessly exploring the knowledge from multiple sources. We then developed a robust multiple social network learning model, and applied it to the application of volunteerism tendency prediction. Extensive experiments on real world dataset verify the eﬀectiveness of our scheme. The proposed scheme is applicable to many other domains, such as demographic inference and interest prediction. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing Keywords Multiple Social Network Learning; Missing Data Completion; Volunteerism Tendency Prediction 1. INTRODUCTION With the explosion of social network services, more and more people are involved in multiple social networks for various purposes at the same time. It is reported Permission to make digital or hard copies of all classroom use is granted without fee provided that for profit or commercial advantage and that copies on the first page. Copyrights for components of ACM must be honored. Abstracting with credit is republish, to post on servers or to redistribute and/or a fee. Request permissions from that 52% of online adults concurrently use multiple social media services1 . Diﬀerent aspects of users are disclosed on diﬀerent social networks due to their diﬀerent emphasis. In fact, these views are complementary to each other and essentially characterize the same user from diﬀerent perspectives. As compared with single social network, appropriate aggregation of multiple social networks provides us a potential to comprehensively understand the given users [1, 29]. For example, we can learn descriptive user representation, build predictive models for user proﬁles, and recommend prescriptive actions based on complete historical behaviors. Hence, an eﬀective technique for multiple social network learning (MSNL) is highly desired. Distinguished from multi-view learning which maximizes the agreement between views using unlabeled data, MSNL works towards supervised learning. However, integration of multiple sources is non-trivial [27]. The ﬁrst tough challenge lies in how to fuse users’ heterogeneous distributed data from multiple social networks eﬀectively. One naive approach is to concatenate the feature spaces generated from diﬀerent sources into a uniﬁed feature space. Thereby, traditional machine learning models can be further applied. However, this method simply treats the conﬁdence of all data sources equally and may also lead to the curse of dimensionality. Moreover, it ignores two important facts: 1) diﬀerent aspects of users are revealed in diﬀerent social networks and are thus distributed in diﬀerent feature spaces; and 2) all these aspects tend to characterize the same users. In particular, data from multi-sources describes the same user and thus the results predicted by diﬀerent sources should be similar. Therefore, it is expected to take the source conﬁdence and source consistency into consideration. Another challenge we are facing is the data missing problem. Although some users have social accounts on multiple social networks, generally they are active on only a few of them. One simple approach to address this challenge is to discard all incomplete subjects. It is apparent that this method will dramatically reduce the training size, thereby result in overﬁtting in the model learning stage. Therefore, accurately completing missing data by jointly or part of this work for personal or utilizing multiple sources is a necessity to enhance the copies are not made or distributed learning performance. bear this notice and the full citation To address these problems, we present a scheme for this work owned by others than MSNL, which co-regulates the source conﬁdence and source permitted. To copy otherwise, or consistency. Figure 1 shows our proposed scheme comprising to lists, requires prior specific permission of three components. Given a set of users, we ﬁrst crawl

[email protected]

. WOODSTOCK ’97 El Paso, Texas USA 1 ⃝c 2015 ACM. ISBN 978-1-4503-3621-5/15/08 ...$15.00. According Paw Research Internet Project’s Social Media DOI: http://dx.doi.org/10.1145/2766462.2767726. Update 2014: http://www.pewinternet.org/ 1 Figure 1: Illustration of our proposed scheme. We ﬁrst collect and align users’ distributed data from multiple social networks. We then jointly infer the block-wise missing data based on the available data. We ﬁnally apply MSNL on the complete data. SNi , xj , and yl refer to the i-th social network, j-th user sample, and the l-th corresponding label, respectively. their historical contents and all social connections. The ﬁrst latent subpace shared by diﬀerent sources [12] and the component extracts the multi-faceted information cues to original missing data can then be derived in turn. describe a given user, including demographic information, • We empirically evaluate our proposed scheme on practical behaviors, historical posts, and proﬁles of social the application of volunteerism tendency prediction. connections. To deal with the block-wise missing data, the In addition, we develop a set of volunteer-oriented second component attempts to infer the block-wise missing features to characterize users’ volunteerism tendency. data by learning a latent space shared by diﬀerent social We have released our compiled dataset2 to facilitate networks, achieving a complete input to the next component. other researchers to repeat our experiments and verify We ﬁnally use the last component to conduct MSNL on their proposed approaches. the complete data. Particularly, we model the conﬁdence of diﬀerent data sources and the consistency among them The remainder of this paper is structured as follows, by unifying two regularization terms into our model. Section 2 brieﬂy reviews the related work. Section Based upon our proposed scheme, we introduce one 3 describes the proposed MSNL model. Missing data application scenario: volunteerism tendency prediction. completion is introduced in Section 4. Section 5 presents Volunteerism was deﬁned in [18] as long-term, planned, the set of volunteer-oriented features we developed. Section prosocial behaviors that occur within organizational settings 6 details the experimental results and analysis, followed by and can beneﬁt strangers. Persons exhibiting volunteerism our concluding remarks in Section 7. are the so-called volunteers, who serve as an important work force in modern society. Traditionally, it is intractable 2. RELATED WORK for nonproﬁt organizations (NPOs) to aimlessly recruit Although our work is distinguished from multi-view volunteers from the huge crowd. It is thus necessary learning, we can still beneﬁt from their eﬀorts. Zhang et al. to develop an automatic volunteerism tendency prediction [28] proposed an inductive multi-view multi-task learning system to alleviate the dilemma that NPOs are facing. model (regMVMT). regMVMT penalizes the disagreement In particular, we take the advantage of users’ casually of models learned from diﬀerent sources over the unlabeled distributed online data, especially from multiple social samples. Besides, the authors also studied the structured networks, which can comprehensively reveal users’ personal missing data, which is completely missing for a source in concerns, interests [5, 22] and even personality traits [20, terms of a task. In other words, if a source is available for 21]. On the other hand, the real word dataset may contain a task, then all samples will have data from this source. block-wise missing data due to some users’ inactivity in However, they overlooked the source weights and did not social networks. Moreover, the speciﬁc task also brings us pay attention to the partially structured missing data, another issue in terms of data collection and ground truth which are both what we are concerned with. Yuan et al. construction. Therefore, the proposed scheme naturally ﬁts [25] introduced an incomplete multi-source feature learning this application scenario. method, avoiding the direct inference of block-wise missing Our main contributions can be summarized in threefold: data. Particularly, the authors split the incomplete data • We propose a novel MSNL model, which is able into disjoint groups, where they conducted feature learning to model both the source conﬁdence and source independently. However, such a mechanism constrains us consistency. Speciﬁcally, we can obtain a closed-form to conduct source level analysis. Later, Xiang et al. [24] solution by taking the inverse of a linear system, which investigated multi-source learning with block-wise missing has been mathematically proven to be invertible. • We propose an approach to deal with missing data in 2 The compiled dataset is currently publicly accessible via: multiple social networks, which ﬁrst learns a common http://multiplesocialnetworklearning.azurewebsites.net/ 2 data with an application of Alzheimer’s Disease prediction formalized as follows, and proposed the iSFS model. Apart from feature-level 1∑ S analysis, the authors also conducted source-level analysis by f (X) = fs (Xs ). (2) introducing the weights of models obtained from diﬀerent S s=1 sources. However, ignoring the consistency relationships among diﬀerent models seems inappropriate. In addition, However, in reality, diﬀerent social networks always have the authors also adapted the model to handle cases where diﬀerent conﬁdence to the ﬁnal prediction, and we consider block-wise missing data exists. Diﬀerent from their work, modeling the weights of multiple sources instead of treating we infer the missing data by making full use of the available all sources equally by introducing the weight vector: α = data before applying MSNL, which is more generalizable to [α1 , α2 , · · · , αS ]T ∈ RS×1 , where αs controls the weight of other applications. model learned from s-th social network. Then the ﬁnal model is deﬁned as follows, 3. MULTIPLE SOCIAL NETWORK LEAR- ∑ S f (X) = αs fs (Xs ) NING s=1 This section details our proposed MSNL model and subject to eT α = 1, (3) derives an analytic solution by solving the inverse of a linear system, whose invertibility is proved rigorously. where e = [1, 1, · · · , 1] ∈ R T S×1 . It is worth mentioning that we do not impose the constraint of αs ≥ 0, as we 3.1 Notation want to keep both positive and negative weights. Positive We ﬁrst declare some notations. In particular, we use bold weights indicate the positive correlations of social networks capital letters (e.g. X) and bold lowercase letters (e.g. x) to with the ﬁnal results, while negative weights reﬂect negative denote matrices and vectors, respectively. We employ non- correlations between the given task and diﬀerent sources, bold letters (e.g. x) to represent scalars, and Greek letters which may contain unreliable and noisy data. (e.g. λ) as parameters. If not clariﬁed, all vectors are in For the s-th social network, we learn a linear mapping column forms. function indexed by a model ws ∈ RDs ×1 . Then the Suppose we have a set of N labeled data samples and objective function can be rewritten as follows, S ≥ 2 social networks. We compile the S social networks 2 1 ∑S ∑ with an index set C = {1, 2, · · · , S}. Let Ds and Ns y − ∑ αs Xs ws + µ S min Xs ws − Xs′ ws′ 2 denote the number of features and samples in the s-th social ws ,α 2N s=1 2N s=1 ′ s ̸=s network, s ∈ C, respectively. Let Xs ∈ RN ×Ds denote λ ∑ S the feature matrix extracted from the s-th social network. + ws 2 + β α 2 , (4) Each row represents a user sample. Then the dimension 2 s=1 2 of ∑Sfeatures extracted from all these social networks is D = s=1 Ds . The whole feature matrix can be written as X = where eT α = 1 and β is the regularization parameter, {X1 , X2 , · · · , XS } ∈ RN ×D and y = {y1 , y2 , · · · , yN }T ∈ controlling the sparsity of the solution regarding α. {1, −1}N ×1 is the corresponding label vector. 3.3 Optimization 3.2 Problem Formulations We adopt the alternating optimization strategy to solve Based on a set of data samples with S social networks, the two variables α and ws in Eqn. (4). In particular, we can learn S predictive models, where each model is we optimize one variable while ﬁxing the other one in individually and independently trained on a social network. each iteration. We keep this iterative procedure until the The ﬁnal predictive model can be strengthened via linear objective function converges. combination of these S models. Mathematically, we learn one linear mapping function fs for the s-th social network. 3.3.1 Computing α with ws fixed In addition, we assume that the mapping functions learned We denote the objective function as Γ. For simplicity, we from all social networks agree with one another as much replace y in Eqn. (4) by yeT α, as eT α = 1. With the help as possible. Particularly, we can formalize this assumption of Lagrangian, Γ can be rewritten as follows, using regularization function. Using the least square loss 1 yeT α − XWα 2 + β α 2 + δ(1 − eT α), (5) function, we have the following objective function, min α 2N 2 1 ∑S ∑ where δ is the nonnegative Lagrange multiplier and W = min y − f (X) 2 + µ fs (Xs ) − fs′ (Xs′ ) 2 fs 2N 2N s=1 ′ diag(w1 , w2 , · · · , wS ) ∈ RD×S . Taking derivative of Γ with s ̸=s respect to α, we have, λ 2 + f , (1) ∂Γ 1 2 = (yeT − XW)T (yeT − XW)α + βα − δe. (6) ∂α N where f (X) is the ﬁnal predictive model. fs (Xs ) is the Setting Eqn. (6) to zero, it can be derived that, prediction results generated from data Xs . λ and µ are the nonnegative regularization parameters that regulate the α = δM−1 e, (7) sparsity of the solution regarding fs and the disagreement among models learned from diﬀerent social networks, where respectively. If we just treat the conﬁdence of diﬀerent 1 M= (yeT − XW)T (yeT − XW) + βI. (8) social networks equally, the ﬁnal predictive model can be N 3 Since eT α = 1, we can obtain that, Algorithm 1 Alternative optimization for solving Eqn. (4) Input: X, y, λ, β, µ 1 M−1 e Output: α, w δ= , α= . (9) eT M−1 e eT M−1 e 1: Initialize (w)0 by ﬁtting each source individually on the available data. Initialize (α)0 = [ S1 , S1 , · · · , S1 ]. Obviously, M ∈ RS×S is positive deﬁnite and invertible, 2: for k = 1, 2, · · · do according to the deﬁnition. We thus can obtain the analytic 3: Compute each (α)k according to Eqn. (9). solution of α as Eqn. (9). Moreover, we note that when 4: Update (w)k according to Eqn. (13). the prediction results learned from all social networks are 5: if the objective value stops decreasing then equal, where X1 w1 = X2 w2 = · · · = XS wS , then same 6: return α = (α)k and w = (w)k weights will be assigned, i.e., α1 = α2 = · · · = αS . In 7: end if addition, Eqn. (9) tends to assign higher weight αs , if smaller 8: end for diﬀerence exists between y and Xs ws . 3.3.2 Computing ws with α fixed we need to prove that hT Lh When α is ﬁxed, we compute the derivative of Γ regarding ∑ S ∑ S ws as follows, = hTi Lij hj i=1 j=1 ∑ 1 [∑ S ∂Γ 1 2 S ∑S = αs XTs ( αs Xs ws − y) = λ h + αi Xi hi 2 + µ(S − 1) Xi hi 2 ∂ws N s=1 N i=1 i=1 µ T∑∑ ] S ∑ S ∑ ∑ S ∑ + Xs (Xs ws − Xs′ ws′ ) + λws + αi hTi XTi αj Xj hj − µ hTi XTi Xj hj , (14) N s=1 ′ s ̸=s i=1 j̸=i i=1 j̸=i [ α2s µ(S − 1) T ] = λI + XTs Xs + Xs Xs ws is always larger than zero. In fact, given an arbitrary vector N N bi , we have, ∑ S ∑ 1 αs T + (αs αs′ − µ)XTs Xs′ ws′ − Xs y, (10) b1 − b2 2 + · · · + b(S−1) − bS 2 + bS − b1 2 ≥ 0 s=1 s′ ̸=s N N ∑ S 2 ∑ S ∑ b i ≥ bTi bj . (15) where I is a Ds × Ds identity matrix. Setting Eqn. (10) i=1 i=1 j̸=i to zero and rearranging the terms, all ws ’s can be learned jointly by the following linear system, Therefore, as S ≥ 2, we have the following inequality, ∑ S ∑ S Lw = t µ(S − 1) Xi hi 2 ≥ µ Xi hi 2      i=1 i=1 L11 L12 L13 · · · L1S w1 t1  L21 L22 L23 · · · L2S   w2   t2  ∑ S ∑      ≥µ (Xi hi )T Xj hj . (16)  L31 L32 L33 · · · L3S         w3  =  t3  , (11)  .. .. .. .. ..   ..   ..  i=1 j̸=i  . . . . .  .   .  Besides, we know that, LS1 LS2 LS3 · · · LSS wS tS ∑ S ∑ S ∑ αi Xi hi 2 + αi hTi XTi αj Xj hj where L ∈ R D×D is a sparse block matrix with S × S i=1 i=1 j̸=i blocks, w = [w1T , w2T , · · · , wST ]T ∈ RD×1 and t = 2 [tT1 , tT2 , · · · , tTS ]T ∈ RD×1 are both sparse block vectors with ∑ S S = 1 αi Xi hi 2 + 1 ∑ αi Xi hi ≥ 0. (17) S × 1 blocks. ts , Lss and Lss′ are deﬁned as follows, 2 2 i=1 i=1   ts = αs N XTs y, Based upon Eqn. (16) and Eqn. (17), we have that, α2 −µ 2 hT Lh ≥ λ h . µS Lss = λI + sN XTs Xs + XTs Xs , (12)   αs αs′ −µ N (18) Lss′ = N XTs Xs′ . As h ̸= 0, h Lh is always larger than zero. Consequently, T Technically, t can be treated as a constant matrix as α is L is invertible. The overall procedures for alternating ﬁxed. It is worth noting that L is symmetric as Lss′ = LTs′ s . optimization are summarized in Algorithm 1. As each If we can prove that L is invertible, then we can derive the iteration can decrease Γ, whose lower bound is zero, we can closed-form solution of w as follows, guarantee the convergence of Algorithm 1 [7, 16]. w = L−1 t. (13) 4. MISSING DATA COMPLETION In this section, we deal with a more challenging and We now show L is invertible by proving that L is a positive- realistic situation, where block-wise missing data exists, deﬁnite matrix. Let h = [hT1 , hT2 , · · · , hTS ]T ∈ RD×1 ̸= 0 be and propose an approach for multiple social network data an arbitrary block vector, where hi ∈ RDi ×1 , i ∈ C. Then completion (MSNDC). In such situations, user samples may 4 not be active in all social networks, which leads to the block- wise data missing. Suppose we have S data sources in total and each sample has at least one data source available. We employ the subset Ci ⊆ C to indicate the presence of each source and the signature of a speciﬁc social network combination. Based on these combinations, all the data samples can be split into multiple exclusive sets, where each set corresponds to a combination. Figure 2 illustrates the incomplete data in our dataset. As can be seen, all users have complete features from SN1 , while some users miss data in SN2 or SN3 . Therefore, our dataset can be split by four exclusive social network combinations: C1 = {1, 2}, C2 = {1, 2, 3}, C3 = {1, 3}, C4 = {1}. Inspired by [10], we use Non-negative Matrix Factorization Figure 2: Illustration of the incomplete data from (NMF) to explore the latent spaces that are shared by three sources. XCs i denotes the samples generated diﬀerent social networks, and further infer the missing data from social network s that are only available in the based upon these latent spaces. It is reasonable to assume social network combination of Ci . that the data from diﬀerent social networks about the same user shares certain latent features. We employ XCs i ∈ the latent representation, where the authors in [10] just RNCi ×Ds to denote the samples generated from the s-th apply cluster algorithms directly to the latent representation social network. It only contains samples that are available of data instead of the original data. This is due to two in the set of social networks Ci , where NCi stands for the considerations. One is that we believe the value of original number of these samples. We use Us ∈ Rz×Ds to represent known data is higher than the latent representation. The the latent basis matrix for the s-th social network, and PCs i ∈ other one is that we need to preserve the heterogeneity RNCi ×z to denote the corresponding latent representation of among data from diﬀerent sources to ﬁt the MSNL model. feature matrix XCs i . z is the dimension of the shared latent space of diﬀerent social networks. The intuitive assumption 4.1 Optimization is that for the samples available in both the s-th and s′ -th In order to increase the eﬃciency of the iterative procedure, social networks, their corresponding latent representations we initialize Us by optimizing the following objective should also be quite similar. In particular, we impose this function, constraint to NMF as follows, 2 min X1{1,2,3} − P{1,2,3} U1 +ν P{1,2,3} 1 + η U1 1 PCs i = PCs′i = PCi , (19) Us ≥0 2 ′ ′ where s ̸= s , s ∈ Ci , and s ∈ Ci . We thus learn the shared + X2{1,2,3} − P{1,2,3} U2 +η U2 1 subspaces by the following objective function, 2  X{1}    2 + X3{1,2,3} − P{1,2,3} U3 +η U3 1 . (21) 1 P{1}  {1,2}   {1,2}   X 1  P We then alternatively optimize Us and Ps until the min  {1,3}  −  {1,3}  U1 +ν P1 1 + η U1 1  Us ≥0  X  P objective function converges. Speciﬁcally, we employ the Ps ≥0 1 X{1,2,3} 1 P{1,2,3} greedy coordinate descent (GCD) approach [9], which [ ] [ ] 2 has been proven to be tremendously fast to solve NMF {1,2} X2 P{1,2} decomposition with L1-norm regularization. Finally, we + {1,2,3} − {1,2,3} U2 +ν P2 1 + η U2 1 obtain Ps , Us , s ∈ C, based on which we can infer the X2 P [ ] [ ] 2 missing data as follows, {1,3} X3 P{1,3} ˆ Cs i = PCi Us , + {1,2,3} − {1,2,3} U3 +ν P3 1 + η U3 1 , X ∀s ∈ / Ci . (22) X3 P (20) Algorithm 2 summarizes the overall procedures for alternating optimization. where ν and η are the nonnegative tradeoﬀ parameters for the regularizations. Similarly, we employ the alternating 5. APPLICATION: VOLUNTEERISM TEN- optimization strategy to solve the optimization in Eqn. (20). To be more speciﬁc, we ﬁrst initialize Us and compute DENCY PREDICTION the optimal Ps . Afterwards, Ps is updated based on the In this work, we apply the proposed scheme to an computed Us . We keep this iterative procedure until the application scenario: volunteerism tendency prediction. In objective function converges. modern society, volunteers are extremely crucial to NPOs The proposed approach diﬀers from [10] in the following to sustain their continuing operations. The discovery three aspects. First, MSNDC is generalized to handle the of users’ volunteerism tendency can signiﬁcantly facilitate more challenging scenario where data samples are extracted the recruitment of volunteers for NPOs, which can save from more than two social networks. Second, apart considerable cost to ﬁnd the potential volunteers. In from regulating the latent representation matrix, we also particular, we cast the problem of volunteerism tendency incorporate the regularization on the latent basis matrix. prediction as a user binary classiﬁcation. If the predicted Third, we further derive the original missing data from tendency score of a given user is larger than a pre-deﬁned 5 Algorithm 2 Alternative optimization for solving Eqn. (20) provided by Quora. Initially, we selected two popular users Input: X1 , X2 , X3 , ν, η as the seed users and then explored all their neighboring Output: ˆ X connected users. We applied similar exploration approach (0) 1: Initialize Us according to Eqn. (21). to all other non-seed users. In the end, we collected 172, 235 2: for k = 1, 2, · · · do users’ proﬁles and only retained those who have accounts in 3: for s = 1, 2, · · · , S do Facebook, Twitter and LinkedIn. (k) 4: Compute each Ps according to Eqn. (20) via GCD approach. 5.2 Ground Truth Construction (k) 5: Update Us according to Eqn. (20) via GCD Based on these candidates, we launched a crawler to approach. collect their historical social contents, including their basic 6: if the objective value stops decreasing then proﬁles, social posts and relations. However, the traditional (k) (k) 7: return Us = Us and Ps = Ps web-based crawler is not applicable to Facebook due to 8: end if its dynamic loading mechanism. We thus resorted to the 9: end for Selenium7 to simulate users’ click and scroll operations 10: end for on a FireFox browser and load users’ publicly available 11: for j = 1, 2, · · · , S do information. We limited the access rate to one request per 12: for Cq ⊆ C do second to avoid being blocked by the robot checkers. It is 13: if j ∈ Cq then worth mentioning that the data we collected is all publicly 14: Xˆ Cq = XCq . available. On the other hand, due to the privacy constraint, j j 15: else we could not access uses’ social relations in Facebook and 16: Infer Xˆ Cq according to Eqn. (22). LinkedIn. We hence only collected users’ followee relations j 17: end if in Twitter. 18: end for In order to improve the quality of our dataset, we 19: end for employed three annotators to ﬁnalize our ground truth. As users tend to provide more complete and reliable proﬁles in LinkedIn, we guided the annotators to study threshold γ, we regard this user as a volunteer. In this work, the LinkedIn proﬁles of candidate users, and determine we explore three popular social networks: Twitter, Facebook whether they are “volunteers” by majority votes. To ensure and LinkedIn, as they are representative of a public, private, a uniformly labeling procedure, we provided them a piece and professional social network, respectively. Besides, it of guideline. Given a user’s LinkedIn proﬁle, we classiﬁed is known that users exhibit diﬀerent aspects on diﬀerent the user as a volunteer if and only if this user lists his/her social networks [1], and the combination of these three social volunteer experiences in the section “Volunteer experience networks would help to better characterize user behaviors on & Causes” or section “Experience”. Candidates who do social platforms. not satisfy the above two criteria were tagged as non- volunteers. We focused on LinkedIn to determine whether 5.1 Multiple Social Accounts Mappings users are volunteers because the volunteer experiences in To represent the same users with multiple sources, we need LinkedIn are the most straightforward evidence to identify to ﬁrst tackle the problem of “Social Account Mapping”, volunteers. It should be noted that those who do not which aims to align the same users across diﬀerent social mention their volunteer experiences in LinkedIn are not networks by linking their multiple social accounts [1]. To necessarily classiﬁed as “non-volunteers”. However, the accurately establish this mapping, we employ the emerging absence of these mentions, at least, reveals their limited social services such as About.me3 and Quora4 , where interests and low enthusiasm in volunteerism. Therefore, they encourage users to explicitly list their multiple social in our work, we broadly deﬁned users as “non-volunteers” if accounts on one proﬁle. they do not mention their relevant volunteerism experiences We proposed two strategies to collect data from About.me. in LinkedIn. Table 1 lists the statistics of our dataset. We obtained • Keyword search: We searched About.me with the the data for 1, 425 volunteers and 4, 011 non-volunteers keyword “volunteer” and obtained 4, 151 volunteer according to the aforementioned strategies. The crawling candidates. was conducted between 22nd August to 11th September, • Random select: We employed Random API5 , 2013. Here we only selected a subset of non-volunteer data provided by About.me, to collect non-volunteers. This and made the dataset balanced to avoid the training bias. To API returns a speciﬁed number of random user proﬁles. facilitate this line of research, this dataset has been released Finally, we harvested 1, 867 non-volunteer candidates. after certain privacy preservation processing. It is worth mentioning that volunteers may be present However, in reality, not all users are active enough on all in these random users. social networks. To ensure the data quality, we treated those To enlarge our dataset, we also collected candidates from inactive users as missing with respect to a speciﬁc social Quora by the breadth-ﬁrst-search method. Particularly, we network. Therefore, there exists block-wise missing data in took advantage of both the follower and followee6 relations our dataset. In particular, we treated a user as missing in Twitter or Facebook, if this user has less than 10 historical 3 https://about.me/ social posts. In addition, due to the absence of social post 4 http://quora.com/ 5 http://about.me/developer/api/docs/ 6 7 If A follows B, then A is B’s follower and B is A’s followee. http://docs.seleniumhq.org/download/ 6 Table 1: Statistics of our dataset. Non- mapping from words to 72 categories9 . Given a document, Data Volunteer LIWC computes the percentage of words in each category volunteer and represents it as a vector of 72 dimensions. Twitter proﬁles ∼1.5k ∼4k To capture the key aspects of LIWC features, we selected Twitter posts ∼559k ∼1m the top 5 dimensions as the representative LIWC features Twitter followees’ proﬁles ∼902k ∼3m according to the information gain ratio. Considering Facebook proﬁles ∼1.5k ∼4k that the emotions for individuals may also aﬀect users’ Facebook posts ∼83k ∼338k volunteerism tendency, we additionally selected two categories LinkedIn proﬁles ∼1.5k ∼4k from LIWC: positive emotion and negative emotion. Besides, we also utilized the positive-negative emotion ratio to mechanism in LinkedIn, we treated a user as missing8 in further reﬂect users’ emotional states. Let L(•) represent LinkedIn if the word count of this user’s proﬁle is less than the percentage of users’ words in certain LIWC category. 50. Figure 3 shows the statistics of our incomplete data. As The positive-negative emotion ratio is deﬁned as, can be seen, about 50% of users have complete data from all L(pos) + ξp three social networks. 1% and 47% of users only miss the P Nemo = L(pos)log , (23) data either from Facebook and LinkedIn, while 2% of users L(neg) + ξn miss the data from both of them. where ξp and ξn are introduced to avoid the situation: individuals have no positive or negative emotional word. They are both set as 0.0001. In total, we have 16 dimension LIWC features, extracted from Twitter and Facebook. User topics. According to our observation, volunteers may have, on average, a higher probability of talking about topics such as social caring or giving back, while the non-volunteers may mention other topics more often. This motivates us to explore the topic distributions of users’ social posts to identify volunteers. We generated topic distributions using Latent Dirichlet Allocation (LDA) [4], which has been widely found to be useful in latent topic modeling [8, 23]. Based on perplexity [14] metric frequently utilized to ﬁnd the optimal number of hidden topics, we ultimately obtained 52, 26, 42 dimensional topic-level features over users’ Twitter, Figure 3: Statistics of the incomplete data. Tw: Facebook and LinkedIn data, respectively. Users with Twitter data only; Tw+Fb: Users with Contextual topics. We deﬁne users’ contextual topics Twitter and Facebook data only; Tw+In: Users as the topics of users’ connections. We believe that with Twitter and Linkedin data only; Tw+Fb+In: the contextual topics intuitively reﬂect the contexts of Users without missing data. users. “He that lies down with dogs must rise up with ﬂeas” tells us that the context signiﬁcantly aﬀects 5.3 Features a user’s tendency. Particularly, we studied followees To capture users’ volunteerism tendency, we extracted a and retweeting10 connections on Twitter because of their rich set of volunteer-oriented features. intuitive reﬂection of topics that users concern. As the bio descriptions are usually provided by users to brieﬂy 5.3.1 Demographic Characteristics introduce themselves and may indicate users’ summarized interests, we integrated the bios of a user’s followees or those The study in [19] reported that some demographic whose tweets are retweeted by this user into two kinds of characteristics, such as education and income level, are bio documents, on which we further applied LDA model. strong indicators for volunteerism. This study inspires us We utilized the perplexity to ﬁx the dimensions of topic- to extract demographic characteristics from users’ proﬁles, level features over followees’ bio documents and retweetings’ especially the Facebook and LinkedIn proﬁles. In our work, bio documents as 40 and 20, respectively. In this work, we we explored users’ demographic characteristics, including only explored the contextual topics in Twitter, since we were Gender, Relationship status, Education level, and Number unable to crawl the connections’ proﬁles in LinkedIn and the of social connections. bio descriptions are usually missing in Facebook. 5.3.2 Linguistic Features 5.3.3 Behavior-based features We also extracted linguistic features, including Linguistic This kind of features is characterized by users’ posting Inquiry and Word Count (LIWC) features, user topics and behavior patterns and networking behavior patterns. The contextual topics. former focuses on the written style of users’ social posts, while the latter captures their egocentric network features. LIWC features. LIWC is widely-used to analyze the psycho-linguistic transparent lexicon. It plays an important Posting behavior patterns. Posting behavior patterns role in predicting users’ personality [2, 15]. The main have been investigated in many scenarios, spanning from component of LIWC is a directory which contains the 9 http://www.liwc.net/ 8 10 Here we exclude the contents of section “Volunteer If A broadcasts a tweet posted by B, then B is A’s a experience & Causes” and section “Experience”. retweeting user. 7 age estimation to social spammers discovery [3, 13]. These SVM: We chose the learning formulation with the kernel patterns can be used to depict users’ participation in of radial-basis function. We implemented this method based information diﬀusion, which correlates with volunteerism on LIBSVM [6]. tendency much. RLS: Regularized least squares model [11] aims to 1 2 2 On one hand, we employed the fraction of users’ posts minimize the objective function of 2N y − Xw + λ2 w . containing certain behaviors, including emoticons, slang In fact, the RLS model can be deduced from MSNL via words, acronyms, hashtags, URLs, and user mentions, to the settings of α = [ S1 , S1 , · · · , S1 ]T , µ = 0 and β = 0. intuitively reﬂect users’ engagement in topic discussion and iSFS: The third baseline is the incomplete source-feature social interaction. On the other hand, we observed that selection model proposed in [24]. This model only assigns users’ posting behaviors in social networks can be classiﬁed weights to models learned from diﬀerent social networks but into a few categories. For example, posts in Twitter can ignores the relationships among them. We can derive iSFS be classiﬁed into two categories, Ctw = {tweets, retweets}, from MSNL by making µ = 0. while posts in Facebook can roughly be split into eight types: regMVMT: The fourth baseline is the regularized multi- Cf b = {share link, share sideo, share status, share photo, view multi-task learning model [28]. This model only change photo, repost, post, tagged}. The distributions over regulates the relationships among diﬀerent views but fails users’ posts on these categories also reﬂect their participation to take the source conﬁdence into account. We can derive in information diﬀusion, revealing whether a given user regMVMT from MSNL by making α = [ S1 , S1 , · · · , S1 ]T . tend to share information in social networks. When it comes to Linkedin, we utilized the proﬁle completeness to Table 2: Performance of diﬀerent models(%). characterize users’ behaviors. Based on our observation, Approaches F1-measure P-value we found that volunteers tend to provide more information for all the sections. This not only reﬂects volunteers’ SVM 83.11 0.038 active participation in LinkedIn but also signals their self- RLS 82.82 0.025 conﬁdence and openness to public. Proﬁle completeness is regMVMT 84.07 0.173 deﬁned as a boolean vector over six dimensions to denote iSFS 84.72 0.281 the presence of the six common sections in LinkedIn proﬁles: MSNL 85.59 - summary, interest, language, education, skill and honor. We excluded the sections on experience and Table 2 shows the performance comparison between volunteer experience & causes, because the ground baselines and our proposed MSNL. We noticed that truth is built on these two sections. MSNL signiﬁcantly outperforms the SVM and RLS. This Egocentric network patterns. We also studied users’ implies that the information on multiple social networks social behaviors from their egocentric networks. Intuitively, are complementary and characterize users’ volunteerism we believe that users belong to certain class tend to be tendency consistently. This also proves that the correlations connected with several class-speciﬁc accounts, as it goes for of diﬀerent social networks with the task of volunteerism that “birds of a feather ﬂock together”. Therefore, volunteers tendency prediction cannot be treated equally. In addition, should interact with some typical accounts in social media. MSNL achieves better performance, as compared with The set of typical accounts is denoted as F V . Inspired by iSFS and regMVMT, which are the derivations of MSNL. [17], we measured the degree of a user’s correlation with This demonstrates that both the source conﬁdence and the volunteerism by three features: the frequency and fraction source consistency deserve particular attention. of a user’s “friends” that belong to F V as well as the total 6.2 On Data Completion Comparison number of “friends”. In particular, we treated both the followees and retweetings as the “friends” of users in Twitter. We further evaluated the component for missing data To construct the F V , we utilized the Twitter proﬁle completion with the following three baseline methods. repository Wefollow11 , which allows us to ﬁnd the most Remove: This method eliminates all data samples that prominent people given a particular category. By crawling are not complete. prominent users falling into categories of Nonprofit, Charity, Average: This method imputes the missing features with Volunteer, NGO, Community Service, Social Welfare and the average values of the corresponding feature items. Christian from Wefollow, we obtained 23, 285 accounts. KNN: The missing data is inferred by averaging its K- nearest neighbors. K is experimentally set as 1. 6. EXPERIMENTS Table 3: Performance of diﬀerent models over We conducted extensive experiments to comparatively diﬀerent data completion strategies. verify our proposed scheme from various angles over a Approaches SVM RLS MSNL system equipped with Intel i72.60 GHz CPU, and 8 GB Remove 74.91 74.66 81.81 memory. In particular, we launched 10-fold cross validation Average 82.09 81.99 85.43 for each experiment, and reported the average performance. KNN 82.60 82.22 85.55 Each fold involves 2, 249 training and 250 testing samples. MSNDC 83.11 82.82 85.59 6.1 On Model Comparison Table 3 shows the performance of diﬀerent models over We compared MSNL with four baselines. Before that, diﬀerent data completion strategies. It can be seen that the data was completed by MSNDC. We also performed MSNDC outperforms the other strategies. Additionally, signiﬁcant test to validate the eﬀectiveness of MSNL. removing all incomplete data samples achieves the worst performance, which may be caused by the fact that it 11 http://wefollow.com/ introduces training bias, making the dataset unbalanced and 8 reduces the size of training dataset. We found that the Table 5: Hot topics discussed by volunteers. percentage of volunteer samples decreases from 50% to 40% Followee and retweeting: contextual topics; Self: after ﬁltering out all incomplete data samples. user topics. Data source Topic words 6.3 On Feature Comparison • public, politics, rights, development Followee To examine the discriminative features we extracted, we • editor, global, journalist, university conducted experiments over diﬀerent kinds of features using • global, nonproﬁt, change, community Retweeting MSNL. We also performed signiﬁcant test to validate the • health, education, learning, university advantage of combining multiple social networks. Table 4 • woman, help, education, child Self comparatively shows the performance of MSNL in terms • volunteer, nonproﬁt, support of diﬀerent feature conﬁgurations. It can be seen that the linguistic features achieves the best performance, as most discriminative features evaluated by Section 6.3 are all compared against demographic characteristics and behavior- extracted from Twitter. based features. This reveals that volunteerism tendency is better reﬂected by their social contents, including their Table 6: Performance of diﬀerent social network own social posts and the self-descriptions of their social combinations(%). Facebook∗ and LinkedIn∗ both connections. This also implies that users with volunteerism refer to the complete data, whose missing data is pre-inferred. F1: F1-measure. Table 4: Performance of diﬀerent features(%). Social network combinations F1 p-value Features F1-measure Twitter 82.35 4.2e-2 Demographic characteristics 68.43 Facebook∗ 73.53 5.0e-7 Linguistic features 80.06 LinkedIn∗ 74.49 3.1e-7 User topics 75.04 Twitter+Facebook∗ 83.67 1.1e-1 Contextual topics 78.14 Twitter+LinkedIn∗ 83.84 1.4e-1 LIWC 68.48 Facebook∗ +LinkedIn∗ 76.29 6.0e-6 Behavior-based features 78.52 Twitter+Facebook∗ +LinkedIn∗ 85.59 - Posting behavior patterns 69.83 Egocentric network patterns 75.91 6.5 Size Varying of Positive Samples In order to verify the usefulness of our model on real tendency may talk about related topics and follow or world dataset, where the volunteers should account for a retweet related social accounts. In addition, we found minority portion of user population, we tuned the fraction of that contextual topics are more discriminative as compared volunteer samples in our dataset. In particular, we fed x%, to users’ own topics. This may be due to the fact that x ∈ [5, 50], of volunteer samples to our model with stepsize users’ self-descriptions are of more value and contain less 5%. Figure 4 shows the F1-measure with respect to diﬀerent noise than users’ tweets. Some hot topics discussed by fraction of volunteer samples of diﬀerent models. As can be volunteers are given in Table 5. Besides, the egocentric seen, our model can achieve satisfactory performance even network patterns also play a dominant role in our task. This when volunteer samples only accounts for 5% of the whole implies that one’s social connections indeed reﬂect the user’s samples. This demonstrates that the proposed MSNL personal concerns to a large extent. model is not sensitive to the percentage of positive samples. Whereas, SVM and RLS are relatively more sensitive to 6.4 On Source Comparison the fraction of volunteer samples in dataset. To demonstrate the descriptiveness of multiple social network integration, we conducted experiments over various source combinations. Notably, data from Facebook and LinkedIn is incomplete and we need to infer the block-wise missing data ﬁrst taking advantage of the complete data samples from Twitter. Table 6 shows the performance of MSNL over diﬀerent social network combinations. We noted that the more sources we incorporate, the better the performance can be achieved. This implies the complementary relationships rather than mutual conﬂicting relationships among the sources. Moreover, we found that aggregating data from all these three social networks can achieve signiﬁcantly Figure 4: F1-measure at diﬀerent fraction of better performance as compared to each of the single volunteer samples. source. Additionally, as the performance obtained from diﬀerent single social networks are not the same, this 6.6 Complexity Discussion validates that incorporating the conﬁdence of diﬀerent In order to analyze the complexity of MSNL, we need social networks to MSNL is reasonable. Interestingly, we to solve the time complexity in terms of constructing observed that MSNL over Twitter alone achieves the much M, L and t as deﬁned in Eqn. (8) and Eqn. (12), and better performance, as compared to that over LinkedIn or computing the inverse of M and L. Assume D ≫ S, Facebook alone. This may be caused by the fact that the the construction of matrix M has a time complexity of 9 O(N DS), and the construction of matrix L has a time 8. REFERENCES complexity of O(N D2 ). Due to the fact that the cost of [1] F. Abel, E. Herder, G.-J. Houben, N. Henze, and D. Krause. Cross-system user modeling and personalization on the social matrix multiplications (XTs Xs′ ) and that of constructing web. UMUAI, 2013. t involved in Eqn. (12) remain the same for all iterations [2] B. Bazelli, A. Hindle, and E. Stroulia. On the personality traits and L is symmetric, we can save much practical time cost. of stackoverﬂow users. In ICSM, 2013. Also, using the standard method, computing the inverse of [3] F. Benevenuto, T. Rodrigues, V. Almeida, J. Almeida, and M. Gon¸ calves. Detecting spammers and content promoters in two core matrices, M and L, has the complexity of O(S 3 ) online video social networks. In SIGIR, 2009. and O(D3 ), respectively. Furthermore, using the method of [4] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet Coppersmith and Winogard, the time cost can be bounded allocation. JMLR, 2003. by O(S 2.376 ) and O(D2.376 ) [26], respectively. We note that [5] D. Carmel, N. Zwerdling, I. Guy, S. Ofek-Koifman, N. Har’El, I. Ronen, E. Uziel, S. Yogev, and S. Chernov. Personalized the speed bottleneck lies in the number of features and the social search based on the user’s social network. In CIKM, number of social networks instead of the number of data 2009. samples. As S and D are usually small, especially S, MSNL [6] C.-C. Chang and C.-J. Lin. Libsvm: a library for support should be eﬃcient in time complexity. vector machines. TIST, 2011. [7] Y. Gao, M. Wang, Z.-J. Zha, J. Shen, X. Li, and X. Wu. To validate the practical eﬃciency of the proposed MSNL Visual-textual joint relevance learning for tag-based social model, we conducted a set of experiments. The comparison image search. TIP, 2013. of average time consumption of diﬀerent models is shown in [8] J. Guo, G. Xu, X. Cheng, and H. Li. Named entity recognition Table 7. As can be seen, MSNL shows superiority over in query. In SIGIR, 2009. [9] C.-J. Hsieh and I. S. Dhillon. Fast coordinate descent methods SVM in terms of the time cost, which takes only 19% with variable selection for non-negative matrix factorization. In of the time that SVM uses. By careful observation, we SIGKDD, 2011. observed that MSNL converges very fast, which on average [10] S.-Y. L. Y. Jiang and Z.-H. Zhou. Partial multi-view clustering. takes about 20 iterations. Even though MSNL takes more In AAAI, 2014. [11] S. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky. A time than RLS and regMVMT due to the consideration method for large-scale l1-regularized least squares problems of source consistency and source conﬁdence, it improves the with applications in signal processing and statistics. J-STSP, performance in terms of F1-measure. 2007. [12] D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature. [13] K. Lee, J. Caverlee, and S. Webb. Uncovering social spammers: Table 7: Time cost of diﬀerent models (%). social honeypots+ machine learning. In SIGIR, 2010. Approach Total (s) Train (s) Test (s) [14] D. Li, B. He, Y. Ding, J. Tang, C. Sugimoto, Z. Qin, E. Yan, SVM 2.0550 1.8211 0.2339 J. Li, and T. Dong. Community-based topic modeling for social tagging. In CIKM, 2010. RLS 0.0639 0.0631 0.0008 [15] D. Markovikj, S. Gievska, M. Kosinski, and D. Stillwell. Mining regMVMT 0.0605 0.0595 0.0006 facebook data for predictive personality modeling. In ICWSM, 2013. iSFS 0.5565 0.5557 0.0008 [16] L. Nie, Y.-L. Zhao, X. Wang, J. Shen, and T.-S. Chua. MSNL 0.3936 0.3929 0.0007 Learning to recommend descriptive tags for questions in social forums. TOIS, 2014. [17] M. Pennacchiotti and A.-M. Popescu. Democrats, republicans and starbucks aﬃcionados: user classiﬁcation in twitter. In 7. CONCLUSIONS AND FUTURE WORK SIGKDD, 2011. This paper presented a novel scheme for multiple social [18] L. A. Penner. Dispositional and organizational inﬂuences on sustained volunteerism: An interactionist perspective. JSI, network learning. This scheme takes the source conﬁdence 2002. and source consistency into consideration by introducing [19] L. A. Penner. Volunteerism and social problems: Making things regularization to the objective function. We further better or worse? JSI, 2004. demonstrated that the proposed scheme, designed for [20] D. Quercia, M. Kosinski, D. Stillwell, and J. Crowcroft. Our twitter proﬁles, our selves: predicting personality with twitter. complete data, is also able to handle the real and more In SocialCom, 2011. challenging cases where there exists block-wise missing [21] D. Quercia, R. Lambiotte, D. Stillwell, M. Kosinski, and data. In particular, before feeding the data into the J. Crowcroft. The personality of popular facebook users. In CSCW, 2012. proposed MSNL model, we inferred the missing data via [22] X. Shen, B. Tan, and C. Zhai. Implicit user modeling for NMF technique. Furthermore, we practically evaluated the personalized search. In CIKM, 2005. proposed scheme in an interesting scenario of volunteerism [23] X. Wei and W. B. Croft. Lda-based document models for tendency prediction. We developed a set of volunteer- ad-hoc retrieval. In SIGIR, 2006. oriented features to characterize users’ volunteerism tendency. [24] S. Xiang, L. Yuan, W. Fan, Y. Wang, P. M. Thompson, and J. Ye. Multi-source learning with block-wise missing data for Experimental results demonstrated the eﬀectiveness of our alzheimer’s disease prediction. In SIGKDD, 2013. proposed scheme and veriﬁed the advantages of utilizing [25] L. Yuan, Y. Wang, P. M. Thompson, V. A. Narayan, and J. Ye. multiple social network over a single source. Currently, we Multi-source learning for joint analysis of incomplete multi-modality neuroimaging data. In SIGKDD, 2012. only consider solving a single task in the proposed scheme. [26] D. Zhai, H. Chang, S. Shan, X. Chen, and W. Gao. Multiview In the future, we will extend our work to the context of metric learning with global consistency and local smoothness. multiple task learning. TIST, 2012. [27] D. Zhang, F. Wang, and L. Si. Composite hashing with multiple information sources. In SIGIR, 2011. [28] J. Zhang and J. Huan. Inductive multi-task learning with Acknowledgments multiple view data. In SIGKDD, 2012. [29] X. Zhu, Z.-Y. Ming, X. Zhu, and T.-S. Chua. Topic hierarchy This research is supported by the Singapore National construction for the organization of multi-source user generated Research Foundation under its International Research Centre contents. In SIGIR, 2013. @ Singapore Funding Initiative and administered by the IDM Programme Oﬃce. 10

(PDF) Multiple Social Network Learning and Its Application in Volunteerism Tendency Prediction