Multiple Social Network Learning and Its Application in Volunteerism Tendency Prediction Xuemeng Song, Liqiang Nie, Luming Zhang, Mohammad Akbari, Tat-Seng Chua National University of Singapore {sxmustc, nieliqiang, zglumg}@gmail.com,

[email protected]

,

[email protected]

ABSTRACT that 52% of online adults concurrently use multiple social We are living in the era of social networks, where people media services1 . Different aspects of users are disclosed on throughout the world are connected and organized by different social networks due to their different emphasis. multiple social networks. The views revealed by different In fact, these views are complementary to each other social networks may vary according to the different services and essentially characterize the same user from different they offer. They are complimentary to each other and perspectives. As compared with single social network, comprehensively characterize a specific user from different appropriate aggregation of multiple social networks provides perspectives. As compared to the scare knowledge conveyed us a potential to comprehensively understand the given by a single source, appropriate aggregation of multiple users [1, 29]. For example, we can learn descriptive user social networks offers us a better opportunity for deep representation, build predictive models for user profiles, and user understanding. The challenges, however, co-exist with recommend prescriptive actions based on complete historical opportunities. The first challenge lies in the existence of behaviors. Hence, an effective technique for multiple social block-wise missing data, caused by the fact that some users network learning (MSNL) is highly desired. Distinguished may be very active in certain social networks while inactive from multi-view learning which maximizes the agreement in others. The second challenge is how to collaboratively between views using unlabeled data, MSNL works towards integrate multiple social networks. Towards this end, we supervised learning. first proposed a novel model for data missing completion by However, integration of multiple sources is non-trivial seamlessly exploring the knowledge from multiple sources. [27]. The first tough challenge lies in how to fuse We then developed a robust multiple social network learning users’ heterogeneous distributed data from multiple social model, and applied it to the application of volunteerism networks effectively. One naive approach is to concatenate tendency prediction. Extensive experiments on real world the feature spaces generated from different sources into a dataset verify the effectiveness of our scheme. The proposed unified feature space. Thereby, traditional machine learning scheme is applicable to many other domains, such as models can be further applied. However, this method simply demographic inference and interest prediction. treats the confidence of all data sources equally and may also lead to the curse of dimensionality. Moreover, it ignores two important facts: 1) different aspects of users are revealed in Categories and Subject Descriptors different social networks and are thus distributed in different H.3.1 [Information Storage and Retrieval]: Content feature spaces; and 2) all these aspects tend to characterize Analysis and Indexing the same users. In particular, data from multi-sources describes the same user and thus the results predicted by Keywords different sources should be similar. Therefore, it is expected to take the source confidence and source consistency into Multiple Social Network Learning; Missing Data Completion; consideration. Another challenge we are facing is the data Volunteerism Tendency Prediction missing problem. Although some users have social accounts on multiple social networks, generally they are active on 1. INTRODUCTION only a few of them. One simple approach to address this With the explosion of social network services, more challenge is to discard all incomplete subjects. It is apparent and more people are involved in multiple social networks that this method will dramatically reduce the training size, for various purposes at the same time. It is reported thereby result in overfitting in the model learning stage. Therefore, accurately completing missing data by jointly Permission to make digital or hard copies of all or part of this work for personal or utilizing multiple sources is a necessity to enhance the classroom use is granted without fee provided that copies are not made or distributed learning performance. for profit or commercial advantage and that copies bear this notice and the full citation To address these problems, we present a scheme for on the first page. Copyrights for components of this work owned by others than MSNL, which co-regulates the source confidence and source ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or consistency. Figure 1 shows our proposed scheme comprising republish, to post on servers or to redistribute to lists, requires prior specific permission of three components. Given a set of users, we first crawl and/or a fee. Request permissions from

[email protected]

. WOODSTOCK ’97 El Paso, Texas USA 1 ⃝c 2015 ACM. ISBN 978-1-4503-3621-5/15/08 ...$15.00. According Paw Research Internet Project’s Social Media DOI: http://dx.doi.org/10.1145/2766462.2767726. Update 2014: http://www.pewinternet.org/ 1 Figure 1: Illustration of our proposed scheme. We first collect and align users’ distributed data from multiple social networks. We then jointly infer the block-wise missing data based on the available data. We finally apply MSNL on the complete data. SNi , xj , and yl refer to the i-th social network, j-th user sample, and the l-th corresponding label, respectively. their historical contents and all social connections. The first latent subpace shared by different sources [12] and the component extracts the multi-faceted information cues to original missing data can then be derived in turn. describe a given user, including demographic information, • We empirically evaluate our proposed scheme on practical behaviors, historical posts, and profiles of social the application of volunteerism tendency prediction. connections. To deal with the block-wise missing data, the In addition, we develop a set of volunteer-oriented second component attempts to infer the block-wise missing features to characterize users’ volunteerism tendency. data by learning a latent space shared by different social We have released our compiled dataset2 to facilitate networks, achieving a complete input to the next component. other researchers to repeat our experiments and verify We finally use the last component to conduct MSNL on their proposed approaches. the complete data. Particularly, we model the confidence of different data sources and the consistency among them The remainder of this paper is structured as follows, by unifying two regularization terms into our model. Section 2 briefly reviews the related work. Section Based upon our proposed scheme, we introduce one 3 describes the proposed MSNL model. Missing data application scenario: volunteerism tendency prediction. completion is introduced in Section 4. Section 5 presents Volunteerism was defined in [18] as long-term, planned, the set of volunteer-oriented features we developed. Section prosocial behaviors that occur within organizational settings 6 details the experimental results and analysis, followed by and can benefit strangers. Persons exhibiting volunteerism our concluding remarks in Section 7. are the so-called volunteers, who serve as an important work force in modern society. Traditionally, it is intractable 2. RELATED WORK for nonprofit organizations (NPOs) to aimlessly recruit Although our work is distinguished from multi-view volunteers from the huge crowd. It is thus necessary learning, we can still benefit from their efforts. Zhang et al. to develop an automatic volunteerism tendency prediction [28] proposed an inductive multi-view multi-task learning system to alleviate the dilemma that NPOs are facing. model (regMVMT). regMVMT penalizes the disagreement In particular, we take the advantage of users’ casually of models learned from different sources over the unlabeled distributed online data, especially from multiple social samples. Besides, the authors also studied the structured networks, which can comprehensively reveal users’ personal missing data, which is completely missing for a source in concerns, interests [5, 22] and even personality traits [20, terms of a task. In other words, if a source is available for 21]. On the other hand, the real word dataset may contain a task, then all samples will have data from this source. block-wise missing data due to some users’ inactivity in However, they overlooked the source weights and did not social networks. Moreover, the specific task also brings us pay attention to the partially structured missing data, another issue in terms of data collection and ground truth which are both what we are concerned with. Yuan et al. construction. Therefore, the proposed scheme naturally fits [25] introduced an incomplete multi-source feature learning this application scenario. method, avoiding the direct inference of block-wise missing Our main contributions can be summarized in threefold: data. Particularly, the authors split the incomplete data • We propose a novel MSNL model, which is able into disjoint groups, where they conducted feature learning to model both the source confidence and source independently. However, such a mechanism constrains us consistency. Specifically, we can obtain a closed-form to conduct source level analysis. Later, Xiang et al. [24] solution by taking the inverse of a linear system, which investigated multi-source learning with block-wise missing has been mathematically proven to be invertible. • We propose an approach to deal with missing data in 2 The compiled dataset is currently publicly accessible via: multiple social networks, which first learns a common http://multiplesocialnetworklearning.azurewebsites.net/ 2 data with an application of Alzheimer’s Disease prediction formalized as follows, and proposed the iSFS model. Apart from feature-level 1∑ S analysis, the authors also conducted source-level analysis by f (X) = fs (Xs ). (2) introducing the weights of models obtained from different S s=1 sources. However, ignoring the consistency relationships among different models seems inappropriate. In addition, However, in reality, different social networks always have the authors also adapted the model to handle cases where different confidence to the final prediction, and we consider block-wise missing data exists. Different from their work, modeling the weights of multiple sources instead of treating we infer the missing data by making full use of the available all sources equally by introducing the weight vector: α = data before applying MSNL, which is more generalizable to [α1 , α2 , · · · , αS ]T ∈ RS×1 , where αs controls the weight of other applications. model learned from s-th social network. Then the final model is defined as follows, 3. MULTIPLE SOCIAL NETWORK LEAR- ∑ S f (X) = αs fs (Xs ) NING s=1 This section details our proposed MSNL model and subject to eT α = 1, (3) derives an analytic solution by solving the inverse of a linear system, whose invertibility is proved rigorously. where e = [1, 1, · · · , 1] ∈ R T S×1 . It is worth mentioning that we do not impose the constraint of αs ≥ 0, as we 3.1 Notation want to keep both positive and negative weights. Positive We first declare some notations. In particular, we use bold weights indicate the positive correlations of social networks capital letters (e.g. X) and bold lowercase letters (e.g. x) to with the final results, while negative weights reflect negative denote matrices and vectors, respectively. We employ non- correlations between the given task and different sources, bold letters (e.g. x) to represent scalars, and Greek letters which may contain unreliable and noisy data. (e.g. λ) as parameters. If not clarified, all vectors are in For the s-th social network, we learn a linear mapping column forms. function indexed by a model ws ∈ RDs ×1 . Then the Suppose we have a set of N labeled data samples and objective function can be rewritten as follows, S ≥ 2 social networks. We compile the S social networks 2 1 ∑S ∑ with an index set C = {1, 2, · · · , S}. Let Ds and Ns y − ∑ αs Xs ws + µ S min Xs ws − Xs′ ws′ 2 denote the number of features and samples in the s-th social ws ,α 2N s=1 2N s=1 ′ s ̸=s network, s ∈ C, respectively. Let Xs ∈ RN ×Ds denote λ ∑ S the feature matrix extracted from the s-th social network. + ws 2 + β α 2 , (4) Each row represents a user sample. Then the dimension 2 s=1 2 of ∑Sfeatures extracted from all these social networks is D = s=1 Ds . The whole feature matrix can be written as X = where eT α = 1 and β is the regularization parameter, {X1 , X2 , · · · , XS } ∈ RN ×D and y = {y1 , y2 , · · · , yN }T ∈ controlling the sparsity of the solution regarding α. {1, −1}N ×1 is the corresponding label vector. 3.3 Optimization 3.2 Problem Formulations We adopt the alternating optimization strategy to solve Based on a set of data samples with S social networks, the two variables α and ws in Eqn. (4). In particular, we can learn S predictive models, where each model is we optimize one variable while fixing the other one in individually and independently trained on a social network. each iteration. We keep this iterative procedure until the The final predictive model can be strengthened via linear objective function converges. combination of these S models. Mathematically, we learn one linear mapping function fs for the s-th social network. 3.3.1 Computing α with ws fixed In addition, we assume that the mapping functions learned We denote the objective function as Γ. For simplicity, we from all social networks agree with one another as much replace y in Eqn. (4) by yeT α, as eT α = 1. With the help as possible. Particularly, we can formalize this assumption of Lagrangian, Γ can be rewritten as follows, using regularization function. Using the least square loss 1 yeT α − XWα 2 + β α 2 + δ(1 − eT α), (5) function, we have the following objective function, min α 2N 2 1 ∑S ∑ where δ is the nonnegative Lagrange multiplier and W = min y − f (X) 2 + µ fs (Xs ) − fs′ (Xs′ ) 2 fs 2N 2N s=1 ′ diag(w1 , w2 , · · · , wS ) ∈ RD×S . Taking derivative of Γ with s ̸=s respect to α, we have, λ 2 + f , (1) ∂Γ 1 2 = (yeT − XW)T (yeT − XW)α + βα − δe. (6) ∂α N where f (X) is the final predictive model. fs (Xs ) is the Setting Eqn. (6) to zero, it can be derived that, prediction results generated from data Xs . λ and µ are the nonnegative regularization parameters that regulate the α = δM−1 e, (7) sparsity of the solution regarding fs and the disagreement among models learned from different social networks, where respectively. If we just treat the confidence of different 1 M= (yeT − XW)T (yeT − XW) + βI. (8) social networks equally, the final predictive model can be N 3 Since eT α = 1, we can obtain that, Algorithm 1 Alternative optimization for solving Eqn. (4) Input: X, y, λ, β, µ 1 M−1 e Output: α, w δ= , α= . (9) eT M−1 e eT M−1 e 1: Initialize (w)0 by fitting each source individually on the available data. Initialize (α)0 = [ S1 , S1 , · · · , S1 ]. Obviously, M ∈ RS×S is positive definite and invertible, 2: for k = 1, 2, · · · do according to the definition. We thus can obtain the analytic 3: Compute each (α)k according to Eqn. (9). solution of α as Eqn. (9). Moreover, we note that when 4: Update (w)k according to Eqn. (13). the prediction results learned from all social networks are 5: if the objective value stops decreasing then equal, where X1 w1 = X2 w2 = · · · = XS wS , then same 6: return α = (α)k and w = (w)k weights will be assigned, i.e., α1 = α2 = · · · = αS . In 7: end if addition, Eqn. (9) tends to assign higher weight αs , if smaller 8: end for difference exists between y and Xs ws . 3.3.2 Computing ws with α fixed we need to prove that hT Lh When α is fixed, we compute the derivative of Γ regarding ∑ S ∑ S ws as follows, = hTi Lij hj i=1 j=1 ∑ 1 [∑ S ∂Γ 1 2 S ∑S = αs XTs ( αs Xs ws − y) = λ h + αi Xi hi 2 + µ(S − 1) Xi hi 2 ∂ws N s=1 N i=1 i=1 µ T∑∑ ] S ∑ S ∑ ∑ S ∑ + Xs (Xs ws − Xs′ ws′ ) + λws + αi hTi XTi αj Xj hj − µ hTi XTi Xj hj , (14) N s=1 ′ s ̸=s i=1 j̸=i i=1 j̸=i [ α2s µ(S − 1) T ] = λI + XTs Xs + Xs Xs ws is always larger than zero. In fact, given an arbitrary vector N N bi , we have, ∑ S ∑ 1 αs T + (αs αs′ − µ)XTs Xs′ ws′ − Xs y, (10) b1 − b2 2 + · · · + b(S−1) − bS 2 + bS − b1 2 ≥ 0 s=1 s′ ̸=s N N ∑ S 2 ∑ S ∑ b i ≥ bTi bj . (15) where I is a Ds × Ds identity matrix. Setting Eqn. (10) i=1 i=1 j̸=i to zero and rearranging the terms, all ws ’s can be learned jointly by the following linear system, Therefore, as S ≥ 2, we have the following inequality, ∑ S ∑ S Lw = t µ(S − 1) Xi hi 2 ≥ µ Xi hi 2      i=1 i=1 L11 L12 L13 · · · L1S w1 t1  L21 L22 L23 · · · L2S   w2   t2  ∑ S ∑      ≥µ (Xi hi )T Xj hj . (16)  L31 L32 L33 · · · L3S         w3  =  t3  , (11)  .. .. .. .. ..   ..   ..  i=1 j̸=i  . . . . .  .   .  Besides, we know that, LS1 LS2 LS3 · · · LSS wS tS ∑ S ∑ S ∑ αi Xi hi 2 + αi hTi XTi αj Xj hj where L ∈ R D×D is a sparse block matrix with S × S i=1 i=1 j̸=i blocks, w = [w1T , w2T , · · · , wST ]T ∈ RD×1 and t = 2 [tT1 , tT2 , · · · , tTS ]T ∈ RD×1 are both sparse block vectors with ∑ S S = 1 αi Xi hi 2 + 1 ∑ αi Xi hi ≥ 0. (17) S × 1 blocks. ts , Lss and Lss′ are defined as follows, 2 2 i=1 i=1   ts = αs N XTs y, Based upon Eqn. (16) and Eqn. (17), we have that, α2 −µ 2 hT Lh ≥ λ h . µS Lss = λI + sN XTs Xs + XTs Xs , (12)   αs αs′ −µ N (18) Lss′ = N XTs Xs′ . As h ̸= 0, h Lh is always larger than zero. Consequently, T Technically, t can be treated as a constant matrix as α is L is invertible. The overall procedures for alternating fixed. It is worth noting that L is symmetric as Lss′ = LTs′ s . optimization are summarized in Algorithm 1. As each If we can prove that L is invertible, then we can derive the iteration can decrease Γ, whose lower bound is zero, we can closed-form solution of w as follows, guarantee the convergence of Algorithm 1 [7, 16]. w = L−1 t. (13) 4. MISSING DATA COMPLETION In this section, we deal with a more challenging and We now show L is invertible by proving that L is a positive- realistic situation, where block-wise missing data exists, definite matrix. Let h = [hT1 , hT2 , · · · , hTS ]T ∈ RD×1 ̸= 0 be and propose an approach for multiple social network data an arbitrary block vector, where hi ∈ RDi ×1 , i ∈ C. Then completion (MSNDC). In such situations, user samples may 4 not be active in all social networks, which leads to the block- wise data missing. Suppose we have S data sources in total and each sample has at least one data source available. We employ the subset Ci ⊆ C to indicate the presence of each source and the signature of a specific social network combination. Based on these combinations, all the data samples can be split into multiple exclusive sets, where each set corresponds to a combination. Figure 2 illustrates the incomplete data in our dataset. As can be seen, all users have complete features from SN1 , while some users miss data in SN2 or SN3 . Therefore, our dataset can be split by four exclusive social network combinations: C1 = {1, 2}, C2 = {1, 2, 3}, C3 = {1, 3}, C4 = {1}. Inspired by [10], we use Non-negative Matrix Factorization Figure 2: Illustration of the incomplete data from (NMF) to explore the latent spaces that are shared by three sources. XCs i denotes the samples generated different social networks, and further infer the missing data from social network s that are only available in the based upon these latent spaces. It is reasonable to assume social network combination of Ci . that the data from different social networks about the same user shares certain latent features. We employ XCs i ∈ the latent representation, where the authors in [10] just RNCi ×Ds to denote the samples generated from the s-th apply cluster algorithms directly to the latent representation social network. It only contains samples that are available of data instead of the original data. This is due to two in the set of social networks Ci , where NCi stands for the considerations. One is that we believe the value of original number of these samples. We use Us ∈ Rz×Ds to represent known data is higher than the latent representation. The the latent basis matrix for the s-th social network, and PCs i ∈ other one is that we need to preserve the heterogeneity RNCi ×z to denote the corresponding latent representation of among data from different sources to fit the MSNL model. feature matrix XCs i . z is the dimension of the shared latent space of different social networks. The intuitive assumption 4.1 Optimization is that for the samples available in both the s-th and s′ -th In order to increase the efficiency of the iterative procedure, social networks, their corresponding latent representations we initialize Us by optimizing the following objective should also be quite similar. In particular, we impose this function, constraint to NMF as follows, 2 min X1{1,2,3} − P{1,2,3} U1 +ν P{1,2,3} 1 + η U1 1 PCs i = PCs′i = PCi , (19) Us ≥0 2 ′ ′ where s ̸= s , s ∈ Ci , and s ∈ Ci . We thus learn the shared + X2{1,2,3} − P{1,2,3} U2 +η U2 1 subspaces by the following objective function, 2  X{1}    2 + X3{1,2,3} − P{1,2,3} U3 +η U3 1 . (21) 1 P{1}  {1,2}   {1,2}   X 1  P We then alternatively optimize Us and Ps until the min  {1,3}  −  {1,3}  U1 +ν P1 1 + η U1 1  Us ≥0  X  P objective function converges. Specifically, we employ the Ps ≥0 1 X{1,2,3} 1 P{1,2,3} greedy coordinate descent (GCD) approach [9], which [ ] [ ] 2 has been proven to be tremendously fast to solve NMF {1,2} X2 P{1,2} decomposition with L1-norm regularization. Finally, we + {1,2,3} − {1,2,3} U2 +ν P2 1 + η U2 1 obtain Ps , Us , s ∈ C, based on which we can infer the X2 P [ ] [ ] 2 missing data as follows, {1,3} X3 P{1,3} ˆ Cs i = PCi Us , + {1,2,3} − {1,2,3} U3 +ν P3 1 + η U3 1 , X ∀s ∈ / Ci . (22) X3 P (20) Algorithm 2 summarizes the overall procedures for alternating optimization. where ν and η are the nonnegative tradeoff parameters for the regularizations. Similarly, we employ the alternating 5. APPLICATION: VOLUNTEERISM TEN- optimization strategy to solve the optimization in Eqn. (20). To be more specific, we first initialize Us and compute DENCY PREDICTION the optimal Ps . Afterwards, Ps is updated based on the In this work, we apply the proposed scheme to an computed Us . We keep this iterative procedure until the application scenario: volunteerism tendency prediction. In objective function converges. modern society, volunteers are extremely crucial to NPOs The proposed approach differs from [10] in the following to sustain their continuing operations. The discovery three aspects. First, MSNDC is generalized to handle the of users’ volunteerism tendency can significantly facilitate more challenging scenario where data samples are extracted the recruitment of volunteers for NPOs, which can save from more than two social networks. Second, apart considerable cost to find the potential volunteers. In from regulating the latent representation matrix, we also particular, we cast the problem of volunteerism tendency incorporate the regularization on the latent basis matrix. prediction as a user binary classification. If the predicted Third, we further derive the original missing data from tendency score of a given user is larger than a pre-defined 5 Algorithm 2 Alternative optimization for solving Eqn. (20) provided by Quora. Initially, we selected two popular users Input: X1 , X2 , X3 , ν, η as the seed users and then explored all their neighboring Output: ˆ X connected users. We applied similar exploration approach (0) 1: Initialize Us according to Eqn. (21). to all other non-seed users. In the end, we collected 172, 235 2: for k = 1, 2, · · · do users’ profiles and only retained those who have accounts in 3: for s = 1, 2, · · · , S do Facebook, Twitter and LinkedIn. (k) 4: Compute each Ps according to Eqn. (20) via GCD approach. 5.2 Ground Truth Construction (k) 5: Update Us according to Eqn. (20) via GCD Based on these candidates, we launched a crawler to approach. collect their historical social contents, including their basic 6: if the objective value stops decreasing then profiles, social posts and relations. However, the traditional (k) (k) 7: return Us = Us and Ps = Ps web-based crawler is not applicable to Facebook due to 8: end if its dynamic loading mechanism. We thus resorted to the 9: end for Selenium7 to simulate users’ click and scroll operations 10: end for on a FireFox browser and load users’ publicly available 11: for j = 1, 2, · · · , S do information. We limited the access rate to one request per 12: for Cq ⊆ C do second to avoid being blocked by the robot checkers. It is 13: if j ∈ Cq then worth mentioning that the data we collected is all publicly 14: Xˆ Cq = XCq . available. On the other hand, due to the privacy constraint, j j 15: else we could not access uses’ social relations in Facebook and 16: Infer Xˆ Cq according to Eqn. (22). LinkedIn. We hence only collected users’ followee relations j 17: end if in Twitter. 18: end for In order to improve the quality of our dataset, we 19: end for employed three annotators to finalize our ground truth. As users tend to provide more complete and reliable profiles in LinkedIn, we guided the annotators to study threshold γ, we regard this user as a volunteer. In this work, the LinkedIn profiles of candidate users, and determine we explore three popular social networks: Twitter, Facebook whether they are “volunteers” by majority votes. To ensure and LinkedIn, as they are representative of a public, private, a uniformly labeling procedure, we provided them a piece and professional social network, respectively. Besides, it of guideline. Given a user’s LinkedIn profile, we classified is known that users exhibit different aspects on different the user as a volunteer if and only if this user lists his/her social networks [1], and the combination of these three social volunteer experiences in the section “Volunteer experience networks would help to better characterize user behaviors on & Causes” or section “Experience”. Candidates who do social platforms. not satisfy the above two criteria were tagged as non- volunteers. We focused on LinkedIn to determine whether 5.1 Multiple Social Accounts Mappings users are volunteers because the volunteer experiences in To represent the same users with multiple sources, we need LinkedIn are the most straightforward evidence to identify to first tackle the problem of “Social Account Mapping”, volunteers. It should be noted that those who do not which aims to align the same users across different social mention their volunteer experiences in LinkedIn are not networks by linking their multiple social accounts [1]. To necessarily classified as “non-volunteers”. However, the accurately establish this mapping, we employ the emerging absence of these mentions, at least, reveals their limited social services such as About.me3 and Quora4 , where interests and low enthusiasm in volunteerism. Therefore, they encourage users to explicitly list their multiple social in our work, we broadly defined users as “non-volunteers” if accounts on one profile. they do not mention their relevant volunteerism experiences We proposed two strategies to collect data from About.me. in LinkedIn. Table 1 lists the statistics of our dataset. We obtained • Keyword search: We searched About.me with the the data for 1, 425 volunteers and 4, 011 non-volunteers keyword “volunteer” and obtained 4, 151 volunteer according to the aforementioned strategies. The crawling candidates. was conducted between 22nd August to 11th September, • Random select: We employed Random API5 , 2013. Here we only selected a subset of non-volunteer data provided by About.me, to collect non-volunteers. This and made the dataset balanced to avoid the training bias. To API returns a specified number of random user profiles. facilitate this line of research, this dataset has been released Finally, we harvested 1, 867 non-volunteer candidates. after certain privacy preservation processing. It is worth mentioning that volunteers may be present However, in reality, not all users are active enough on all in these random users. social networks. To ensure the data quality, we treated those To enlarge our dataset, we also collected candidates from inactive users as missing with respect to a specific social Quora by the breadth-first-search method. Particularly, we network. Therefore, there exists block-wise missing data in took advantage of both the follower and followee6 relations our dataset. In particular, we treated a user as missing in Twitter or Facebook, if this user has less than 10 historical 3 https://about.me/ social posts. In addition, due to the absence of social post 4 http://quora.com/ 5 http://about.me/developer/api/docs/ 6 7 If A follows B, then A is B’s follower and B is A’s followee. http://docs.seleniumhq.org/download/ 6 Table 1: Statistics of our dataset. Non- mapping from words to 72 categories9 . Given a document, Data Volunteer LIWC computes the percentage of words in each category volunteer and represents it as a vector of 72 dimensions. Twitter profiles ∼1.5k ∼4k To capture the key aspects of LIWC features, we selected Twitter posts ∼559k ∼1m the top 5 dimensions as the representative LIWC features Twitter followees’ profiles ∼902k ∼3m according to the information gain ratio. Considering Facebook profiles ∼1.5k ∼4k that the emotions for individuals may also affect users’ Facebook posts ∼83k ∼338k volunteerism tendency, we additionally selected two categories LinkedIn profiles ∼1.5k ∼4k from LIWC: positive emotion and negative emotion. Besides, we also utilized the positive-negative emotion ratio to mechanism in LinkedIn, we treated a user as missing8 in further reflect users’ emotional states. Let L(•) represent LinkedIn if the word count of this user’s profile is less than the percentage of users’ words in certain LIWC category. 50. Figure 3 shows the statistics of our incomplete data. As The positive-negative emotion ratio is defined as, can be seen, about 50% of users have complete data from all L(pos) + ξp three social networks. 1% and 47% of users only miss the P Nemo = L(pos)log , (23) data either from Facebook and LinkedIn, while 2% of users L(neg) + ξn miss the data from both of them. where ξp and ξn are introduced to avoid the situation: individuals have no positive or negative emotional word. They are both set as 0.0001. In total, we have 16 dimension LIWC features, extracted from Twitter and Facebook. User topics. According to our observation, volunteers may have, on average, a higher probability of talking about topics such as social caring or giving back, while the non-volunteers may mention other topics more often. This motivates us to explore the topic distributions of users’ social posts to identify volunteers. We generated topic distributions using Latent Dirichlet Allocation (LDA) [4], which has been widely found to be useful in latent topic modeling [8, 23]. Based on perplexity [14] metric frequently utilized to find the optimal number of hidden topics, we ultimately obtained 52, 26, 42 dimensional topic-level features over users’ Twitter, Figure 3: Statistics of the incomplete data. Tw: Facebook and LinkedIn data, respectively. Users with Twitter data only; Tw+Fb: Users with Contextual topics. We define users’ contextual topics Twitter and Facebook data only; Tw+In: Users as the topics of users’ connections. We believe that with Twitter and Linkedin data only; Tw+Fb+In: the contextual topics intuitively reflect the contexts of Users without missing data. users. “He that lies down with dogs must rise up with fleas” tells us that the context significantly affects 5.3 Features a user’s tendency. Particularly, we studied followees To capture users’ volunteerism tendency, we extracted a and retweeting10 connections on Twitter because of their rich set of volunteer-oriented features. intuitive reflection of topics that users concern. As the bio descriptions are usually provided by users to briefly 5.3.1 Demographic Characteristics introduce themselves and may indicate users’ summarized interests, we integrated the bios of a user’s followees or those The study in [19] reported that some demographic whose tweets are retweeted by this user into two kinds of characteristics, such as education and income level, are bio documents, on which we further applied LDA model. strong indicators for volunteerism. This study inspires us We utilized the perplexity to fix the dimensions of topic- to extract demographic characteristics from users’ profiles, level features over followees’ bio documents and retweetings’ especially the Facebook and LinkedIn profiles. In our work, bio documents as 40 and 20, respectively. In this work, we we explored users’ demographic characteristics, including only explored the contextual topics in Twitter, since we were Gender, Relationship status, Education level, and Number unable to crawl the connections’ profiles in LinkedIn and the of social connections. bio descriptions are usually missing in Facebook. 5.3.2 Linguistic Features 5.3.3 Behavior-based features We also extracted linguistic features, including Linguistic This kind of features is characterized by users’ posting Inquiry and Word Count (LIWC) features, user topics and behavior patterns and networking behavior patterns. The contextual topics. former focuses on the written style of users’ social posts, while the latter captures their egocentric network features. LIWC features. LIWC is widely-used to analyze the psycho-linguistic transparent lexicon. It plays an important Posting behavior patterns. Posting behavior patterns role in predicting users’ personality [2, 15]. The main have been investigated in many scenarios, spanning from component of LIWC is a directory which contains the 9 http://www.liwc.net/ 8 10 Here we exclude the contents of section “Volunteer If A broadcasts a tweet posted by B, then B is A’s a experience & Causes” and section “Experience”. retweeting user. 7 age estimation to social spammers discovery [3, 13]. These SVM: We chose the learning formulation with the kernel patterns can be used to depict users’ participation in of radial-basis function. We implemented this method based information diffusion, which correlates with volunteerism on LIBSVM [6]. tendency much. RLS: Regularized least squares model [11] aims to 1 2 2 On one hand, we employed the fraction of users’ posts minimize the objective function of 2N y − Xw + λ2 w . containing certain behaviors, including emoticons, slang In fact, the RLS model can be deduced from MSNL via words, acronyms, hashtags, URLs, and user mentions, to the settings of α = [ S1 , S1 , · · · , S1 ]T , µ = 0 and β = 0. intuitively reflect users’ engagement in topic discussion and iSFS: The third baseline is the incomplete source-feature social interaction. On the other hand, we observed that selection model proposed in [24]. This model only assigns users’ posting behaviors in social networks can be classified weights to models learned from different social networks but into a few categories. For example, posts in Twitter can ignores the relationships among them. We can derive iSFS be classified into two categories, Ctw = {tweets, retweets}, from MSNL by making µ = 0. while posts in Facebook can roughly be split into eight types: regMVMT: The fourth baseline is the regularized multi- Cf b = {share link, share sideo, share status, share photo, view multi-task learning model [28]. This model only change photo, repost, post, tagged}. The distributions over regulates the relationships among different views but fails users’ posts on these categories also reflect their participation to take the source confidence into account. We can derive in information diffusion, revealing whether a given user regMVMT from MSNL by making α = [ S1 , S1 , · · · , S1 ]T . tend to share information in social networks. When it comes to Linkedin, we utilized the profile completeness to Table 2: Performance of different models(%). characterize users’ behaviors. Based on our observation, Approaches F1-measure P-value we found that volunteers tend to provide more information for all the sections. This not only reflects volunteers’ SVM 83.11 0.038 active participation in LinkedIn but also signals their self- RLS 82.82 0.025 confidence and openness to public. Profile completeness is regMVMT 84.07 0.173 defined as a boolean vector over six dimensions to denote iSFS 84.72 0.281 the presence of the six common sections in LinkedIn profiles: MSNL 85.59 - summary, interest, language, education, skill and honor. We excluded the sections on experience and Table 2 shows the performance comparison between volunteer experience & causes, because the ground baselines and our proposed MSNL. We noticed that truth is built on these two sections. MSNL significantly outperforms the SVM and RLS. This Egocentric network patterns. We also studied users’ implies that the information on multiple social networks social behaviors from their egocentric networks. Intuitively, are complementary and characterize users’ volunteerism we believe that users belong to certain class tend to be tendency consistently. This also proves that the correlations connected with several class-specific accounts, as it goes for of different social networks with the task of volunteerism that “birds of a feather flock together”. Therefore, volunteers tendency prediction cannot be treated equally. In addition, should interact with some typical accounts in social media. MSNL achieves better performance, as compared with The set of typical accounts is denoted as F V . Inspired by iSFS and regMVMT, which are the derivations of MSNL. [17], we measured the degree of a user’s correlation with This demonstrates that both the source confidence and the volunteerism by three features: the frequency and fraction source consistency deserve particular attention. of a user’s “friends” that belong to F V as well as the total 6.2 On Data Completion Comparison number of “friends”. In particular, we treated both the followees and retweetings as the “friends” of users in Twitter. We further evaluated the component for missing data To construct the F V , we utilized the Twitter profile completion with the following three baseline methods. repository Wefollow11 , which allows us to find the most Remove: This method eliminates all data samples that prominent people given a particular category. By crawling are not complete. prominent users falling into categories of Nonprofit, Charity, Average: This method imputes the missing features with Volunteer, NGO, Community Service, Social Welfare and the average values of the corresponding feature items. Christian from Wefollow, we obtained 23, 285 accounts. KNN: The missing data is inferred by averaging its K- nearest neighbors. K is experimentally set as 1. 6. EXPERIMENTS Table 3: Performance of different models over We conducted extensive experiments to comparatively different data completion strategies. verify our proposed scheme from various angles over a Approaches SVM RLS MSNL system equipped with Intel i72.60 GHz CPU, and 8 GB Remove 74.91 74.66 81.81 memory. In particular, we launched 10-fold cross validation Average 82.09 81.99 85.43 for each experiment, and reported the average performance. KNN 82.60 82.22 85.55 Each fold involves 2, 249 training and 250 testing samples. MSNDC 83.11 82.82 85.59 6.1 On Model Comparison Table 3 shows the performance of different models over We compared MSNL with four baselines. Before that, different data completion strategies. It can be seen that the data was completed by MSNDC. We also performed MSNDC outperforms the other strategies. Additionally, significant test to validate the effectiveness of MSNL. removing all incomplete data samples achieves the worst performance, which may be caused by the fact that it 11 http://wefollow.com/ introduces training bias, making the dataset unbalanced and 8 reduces the size of training dataset. We found that the Table 5: Hot topics discussed by volunteers. percentage of volunteer samples decreases from 50% to 40% Followee and retweeting: contextual topics; Self: after filtering out all incomplete data samples. user topics. Data source Topic words 6.3 On Feature Comparison • public, politics, rights, development Followee To examine the discriminative features we extracted, we • editor, global, journalist, university conducted experiments over different kinds of features using • global, nonprofit, change, community Retweeting MSNL. We also performed significant test to validate the • health, education, learning, university advantage of combining multiple social networks. Table 4 • woman, help, education, child Self comparatively shows the performance of MSNL in terms • volunteer, nonprofit, support of different feature configurations. It can be seen that the linguistic features achieves the best performance, as most discriminative features evaluated by Section 6.3 are all compared against demographic characteristics and behavior- extracted from Twitter. based features. This reveals that volunteerism tendency is better reflected by their social contents, including their Table 6: Performance of different social network own social posts and the self-descriptions of their social combinations(%). Facebook∗ and LinkedIn∗ both connections. This also implies that users with volunteerism refer to the complete data, whose missing data is pre-inferred. F1: F1-measure. Table 4: Performance of different features(%). Social network combinations F1 p-value Features F1-measure Twitter 82.35 4.2e-2 Demographic characteristics 68.43 Facebook∗ 73.53 5.0e-7 Linguistic features 80.06 LinkedIn∗ 74.49 3.1e-7 User topics 75.04 Twitter+Facebook∗ 83.67 1.1e-1 Contextual topics 78.14 Twitter+LinkedIn∗ 83.84 1.4e-1 LIWC 68.48 Facebook∗ +LinkedIn∗ 76.29 6.0e-6 Behavior-based features 78.52 Twitter+Facebook∗ +LinkedIn∗ 85.59 - Posting behavior patterns 69.83 Egocentric network patterns 75.91 6.5 Size Varying of Positive Samples In order to verify the usefulness of our model on real tendency may talk about related topics and follow or world dataset, where the volunteers should account for a retweet related social accounts. In addition, we found minority portion of user population, we tuned the fraction of that contextual topics are more discriminative as compared volunteer samples in our dataset. In particular, we fed x%, to users’ own topics. This may be due to the fact that x ∈ [5, 50], of volunteer samples to our model with stepsize users’ self-descriptions are of more value and contain less 5%. Figure 4 shows the F1-measure with respect to different noise than users’ tweets. Some hot topics discussed by fraction of volunteer samples of different models. As can be volunteers are given in Table 5. Besides, the egocentric seen, our model can achieve satisfactory performance even network patterns also play a dominant role in our task. This when volunteer samples only accounts for 5% of the whole implies that one’s social connections indeed reflect the user’s samples. This demonstrates that the proposed MSNL personal concerns to a large extent. model is not sensitive to the percentage of positive samples. Whereas, SVM and RLS are relatively more sensitive to 6.4 On Source Comparison the fraction of volunteer samples in dataset. To demonstrate the descriptiveness of multiple social network integration, we conducted experiments over various source combinations. Notably, data from Facebook and LinkedIn is incomplete and we need to infer the block-wise missing data first taking advantage of the complete data samples from Twitter. Table 6 shows the performance of MSNL over different social network combinations. We noted that the more sources we incorporate, the better the performance can be achieved. This implies the complementary relationships rather than mutual conflicting relationships among the sources. Moreover, we found that aggregating data from all these three social networks can achieve significantly Figure 4: F1-measure at different fraction of better performance as compared to each of the single volunteer samples. source. Additionally, as the performance obtained from different single social networks are not the same, this 6.6 Complexity Discussion validates that incorporating the confidence of different In order to analyze the complexity of MSNL, we need social networks to MSNL is reasonable. Interestingly, we to solve the time complexity in terms of constructing observed that MSNL over Twitter alone achieves the much M, L and t as defined in Eqn. (8) and Eqn. (12), and better performance, as compared to that over LinkedIn or computing the inverse of M and L. Assume D ≫ S, Facebook alone. This may be caused by the fact that the the construction of matrix M has a time complexity of 9 O(N DS), and the construction of matrix L has a time 8. REFERENCES complexity of O(N D2 ). Due to the fact that the cost of [1] F. Abel, E. Herder, G.-J. Houben, N. Henze, and D. Krause. Cross-system user modeling and personalization on the social matrix multiplications (XTs Xs′ ) and that of constructing web. UMUAI, 2013. t involved in Eqn. (12) remain the same for all iterations [2] B. Bazelli, A. Hindle, and E. Stroulia. On the personality traits and L is symmetric, we can save much practical time cost. of stackoverflow users. In ICSM, 2013. Also, using the standard method, computing the inverse of [3] F. Benevenuto, T. Rodrigues, V. Almeida, J. Almeida, and M. Gon¸ calves. Detecting spammers and content promoters in two core matrices, M and L, has the complexity of O(S 3 ) online video social networks. In SIGIR, 2009. and O(D3 ), respectively. Furthermore, using the method of [4] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet Coppersmith and Winogard, the time cost can be bounded allocation. JMLR, 2003. by O(S 2.376 ) and O(D2.376 ) [26], respectively. We note that [5] D. Carmel, N. Zwerdling, I. Guy, S. Ofek-Koifman, N. Har’El, I. Ronen, E. Uziel, S. Yogev, and S. Chernov. Personalized the speed bottleneck lies in the number of features and the social search based on the user’s social network. In CIKM, number of social networks instead of the number of data 2009. samples. As S and D are usually small, especially S, MSNL [6] C.-C. Chang and C.-J. Lin. Libsvm: a library for support should be efficient in time complexity. vector machines. TIST, 2011. [7] Y. Gao, M. Wang, Z.-J. Zha, J. Shen, X. Li, and X. Wu. To validate the practical efficiency of the proposed MSNL Visual-textual joint relevance learning for tag-based social model, we conducted a set of experiments. The comparison image search. TIP, 2013. of average time consumption of different models is shown in [8] J. Guo, G. Xu, X. Cheng, and H. Li. Named entity recognition Table 7. As can be seen, MSNL shows superiority over in query. In SIGIR, 2009. [9] C.-J. Hsieh and I. S. Dhillon. Fast coordinate descent methods SVM in terms of the time cost, which takes only 19% with variable selection for non-negative matrix factorization. In of the time that SVM uses. By careful observation, we SIGKDD, 2011. observed that MSNL converges very fast, which on average [10] S.-Y. L. Y. Jiang and Z.-H. Zhou. Partial multi-view clustering. takes about 20 iterations. Even though MSNL takes more In AAAI, 2014. [11] S. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky. A time than RLS and regMVMT due to the consideration method for large-scale l1-regularized least squares problems of source consistency and source confidence, it improves the with applications in signal processing and statistics. J-STSP, performance in terms of F1-measure. 2007. [12] D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature. [13] K. Lee, J. Caverlee, and S. Webb. Uncovering social spammers: Table 7: Time cost of different models (%). social honeypots+ machine learning. In SIGIR, 2010. Approach Total (s) Train (s) Test (s) [14] D. Li, B. He, Y. Ding, J. Tang, C. Sugimoto, Z. Qin, E. Yan, SVM 2.0550 1.8211 0.2339 J. Li, and T. Dong. Community-based topic modeling for social tagging. In CIKM, 2010. RLS 0.0639 0.0631 0.0008 [15] D. Markovikj, S. Gievska, M. Kosinski, and D. Stillwell. Mining regMVMT 0.0605 0.0595 0.0006 facebook data for predictive personality modeling. In ICWSM, 2013. iSFS 0.5565 0.5557 0.0008 [16] L. Nie, Y.-L. Zhao, X. Wang, J. Shen, and T.-S. Chua. MSNL 0.3936 0.3929 0.0007 Learning to recommend descriptive tags for questions in social forums. TOIS, 2014. [17] M. Pennacchiotti and A.-M. Popescu. Democrats, republicans and starbucks afficionados: user classification in twitter. In 7. CONCLUSIONS AND FUTURE WORK SIGKDD, 2011. This paper presented a novel scheme for multiple social [18] L. A. Penner. Dispositional and organizational influences on sustained volunteerism: An interactionist perspective. JSI, network learning. This scheme takes the source confidence 2002. and source consistency into consideration by introducing [19] L. A. Penner. Volunteerism and social problems: Making things regularization to the objective function. We further better or worse? JSI, 2004. demonstrated that the proposed scheme, designed for [20] D. Quercia, M. Kosinski, D. Stillwell, and J. Crowcroft. Our twitter profiles, our selves: predicting personality with twitter. complete data, is also able to handle the real and more In SocialCom, 2011. challenging cases where there exists block-wise missing [21] D. Quercia, R. Lambiotte, D. Stillwell, M. Kosinski, and data. In particular, before feeding the data into the J. Crowcroft. The personality of popular facebook users. In CSCW, 2012. proposed MSNL model, we inferred the missing data via [22] X. Shen, B. Tan, and C. Zhai. Implicit user modeling for NMF technique. Furthermore, we practically evaluated the personalized search. In CIKM, 2005. proposed scheme in an interesting scenario of volunteerism [23] X. Wei and W. B. Croft. Lda-based document models for tendency prediction. We developed a set of volunteer- ad-hoc retrieval. In SIGIR, 2006. oriented features to characterize users’ volunteerism tendency. [24] S. Xiang, L. Yuan, W. Fan, Y. Wang, P. M. Thompson, and J. Ye. Multi-source learning with block-wise missing data for Experimental results demonstrated the effectiveness of our alzheimer’s disease prediction. In SIGKDD, 2013. proposed scheme and verified the advantages of utilizing [25] L. Yuan, Y. Wang, P. M. Thompson, V. A. Narayan, and J. Ye. multiple social network over a single source. Currently, we Multi-source learning for joint analysis of incomplete multi-modality neuroimaging data. In SIGKDD, 2012. only consider solving a single task in the proposed scheme. [26] D. Zhai, H. Chang, S. Shan, X. Chen, and W. Gao. Multiview In the future, we will extend our work to the context of metric learning with global consistency and local smoothness. multiple task learning. TIST, 2012. [27] D. Zhang, F. Wang, and L. Si. Composite hashing with multiple information sources. In SIGIR, 2011. [28] J. Zhang and J. Huan. Inductive multi-task learning with Acknowledgments multiple view data. In SIGKDD, 2012. [29] X. Zhu, Z.-Y. Ming, X. Zhu, and T.-S. Chua. Topic hierarchy This research is supported by the Singapore National construction for the organization of multi-source user generated Research Foundation under its International Research Centre contents. In SIGIR, 2013. @ Singapore Funding Initiative and administered by the IDM Programme Office. 10