PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a publisher's version. For additional information about this publication click this link. http://repository.ubn.ru.nl/handle/2066/126946 Please be advised that this information was generated on 2020-06-16 and may be subject to change. Estimating Time to Event from Tweets Using Temporal Expressions Ali Hürriyetoǧlu, Nelleke Oostdijk, and Antal van den Bosch Centre for Language Studies Radboud University Nijmegen P.O. Box 9103, NL-6500 HD Nijmegen, The Netherlands {a.hurriyetoglu,n.oostdijk,a.vandenbosch}@let.ru.nl Abstract as possible. In this paper we explore a hybrid rule- based and data-driven method that exploits the ex- Given a stream of Twitter messages about plicit mentioning of temporal expressions to arrive an event, we investigate the predictive at accurate and early TTE estimations. power of temporal expressions in the mes- The idea of publishing future calendars with po- sages to estimate the time to event (TTE). tentially interesting events gathered (semi-) auto- From labeled training data we learn av- matically for subscribers, possibly with personal- erage TTE estimates of temporal expres- ization features and the option to harvest both so- sions and combinations thereof, and de- cial media and the general news, has been imple- fine basic rules to compute the time to mented already and is available through services event from temporal expressions, so that such as Zapaday2 , Daybees3 , and Songkick4 . To when they occur in a tweet that mentions our knowledge, based on the public interfaces of an event we can generate a prediction. We these platforms, these services perform directed show in a case study on soccer matches crawls of (structured) information sources, and that our estimations are off by about eight identify exact date and time references in posts on hours on average in terms of mean abso- these sources. They also manually curate event in- lute error. formation, or collect this through crowdsourcing. 1 Introduction In this study we do not use a rule-based tempo- ral tagger such as the HeidelTime tagger (Strötgen Textual information streams such as those pro- and Gertz, 2013), which searches for only a lim- duced by news media and by social media reflect ited set of temporal expressions. Instead, we pro- what is happening in the real world. These streams pose an approach that uses a large set of temporal often contain explicit pointers to future events that expressions, created by using seed terms and gen- may interest or concern a potentially large amount erative rules, and a training method that automati- of people. Besides media-specific markers such as cally determines the TTE estimate to be associated event-specific hashtags in messages on Twitter1 , with each temporal expression sequence in a data- these messages may contain explicit markers of driven way. Typically, rule-based systems do not place and time that help the receivers of the mes- use the implicit information provided by adverbs sage disambiguate and pinpoint the event on the (‘more’ in ‘three more days’) and relations be- map and calendar. tween non-subsequent elements, while machine- The automated analysis of streaming text mes- learning-based systems do not make use of the sages can play a role in catching these important temporal logic inherent to temporal expressions; events. Part of this analysis may be the identifi- they may identify ‘three more days’ as a temporal cation of the future start time of the event, so that expression but they lack the logical apparatus to the event can be placed on the calendar and appro- compute that this implies a TTE of about 3 × 24 priate action may be taken by the receiver of the hours. To make use of the best of both worlds message, such as ordering tickets, planning a se- we propose a hybrid system which uses informa- curity operation, or starting a journalistic investi- tion about the distribution of temporal expressions gation. The automated identification of the time to 2 event (TTE) should be as accurate and come early http://www.zapaday.com 3 http://daybees.com/ 1 4 http://twitter.com https://www.songkick.com/ 8 Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM) @ EACL 2014, pages 8–16, Gothenburg, Sweden, April 26-30 2014. c 2014 Association for Computational Linguistics as they are used in forward-looking social media 2 Related Work messages in a training set of known events, and combines this estimation method with an exten- Future-reference analysis in textual data has been sive set of regular expressions that capture a large studied from different angles. In the realm of space of possible Dutch temporal expressions. information retrieval the task is more commonly defined as seeking future temporal references in Thus, our proposed system analyzes social me- large document collections such as the Web by dia text to find information about future events, means of time queries (Baeza Yates, 2005). Var- and estimates how long it will take before the ious studies have used temporal expression ele- event takes place. The service offered by this sys- ments as features in an automatic setting to im- tem will be useful only if it generates accurate es- prove the relevance estimation of a web docu- timations of the time to event. Preferably, these ment (Dias et al., 2011; Jatowt and Au Yeung, accurate predictions should come as early as pos- 2011). Information relevant to event times has sible. Moreover, the system should be able, in been the focus of studies such as those by Becker the long run, to freely detect relevant future events et al. (2012) and Kawai et al. (2010). that are not yet on any schedule we know in any Our research is aimed at estimating the time to language represented on social media. For now, event of an upcoming event as precisely as possi- in this paper we focus on estimating the start- ble. Radinsky et al. (2012) approach this problem ing time of scheduled events, and use past and by learning from causality pairs in texts from long- known events for a controlled experiment involv- ranging news articles. Noro et al. (2006) describe ing Dutch twitter messages. a machine-learning-based system for the identifi- cation of the time period in which an event will For our experiment we collected tweets refer- happen, such as in the morning or at night. ring to scheduled Dutch premier league soccer matches. This type of event generally triggers Some case studies are focused on detecting many anticipatory discussions on social media events as early as possible as their unfolding is containing many temporal expressions. Given a fast. The study by Sakaki et al. (2010) describes a held-out soccer match not used during training, system which analyzes the flow of tweets in time our system predicts the time to the event based on and place mentioning an earthquake, to predict the individual tweets captured in a range from eight unfolding quake pattern which may in turn provide days before the event to the event time itself. Each just-in-time alerts to people residing in the loca- estimation is based on the temporal expressions tions that are likely to be struck shortly. Zielinski which occur in a particular twitter message. The et al. (2012) developed an early warning system mean absolute error of the predictions for each of to detect natural disasters in a multilingual fash- the 60 soccer matches in our data set is off by ion and thereby support crisis management. The about eight hours. The results are generated in a quick throughput of news in the Twitter network leave-one-out cross-validation setup5 . is the catalyst in these studies focusing on natu- ral disasters. In our study, we rather rely on the This paper starts with describing the relation of slower build-up of clues in messages in days be- our work to earlier research in Section 2. Section 3 fore an event, at a granularity level of hours. describes the overall experimental setup, including Ritter et al. (2012) aim to create a calendar of a description of the data, the temporal expressions events based on explicit date mentions and words that were used, our two baselines, and the evalua- typical of the event. They train on annotated open tion method used. Next, in Section 4 the results are domain event mentions and use a rule-based tem- presented. The results are analyzed and discussed poral tagger. We aim to offer a more generic so- in Section 5. We conclude with a summary of our lution that makes use of a wider range of tempo- main findings and make suggestions for the direc- ral expressions, including indirect and implicit ex- tion future research may take (Section 6). pressions. Weerkamp and De Rijke (2012) study this type of more generic patterns of anticipation in tweets, 5 but focus on personal future activities, while we Tweet ID’s, per tweet estimations, occurred time ex- pressions and rules can be found at http://www.ru.nl/ aim to predict as early as possible the time to lst/resources/ event of events that affect and interest many users. 9 Our estimations do not target time periods such as Afterwards, we restricted the data to tweets sent mornings or evenings but on the number of hours within eight days before the match8 and elimi- remaining to the event. nated all retweets. This reduced the number of TTE estimation of soccer matches has been the tweets in our final data set to 138,141 tweets. topic of several studies. Kunneman and Van den In this experiment we are working on the as- Bosch (2012) show that machine learning meth- sumption that the presence of a hashtag can be ods can differentiate between tweets posted be- used as proxy for the topic addressed in a tweet. fore, during, and after a soccer match. Estimat- Inspecting a sample of tweets referring to recent ing the time to event of future matches from tweet soccer games not part of our data set, we devel- streams has been studied by Hürriyetoglu et al. oped the hypothesis that the position of the hash- (2013), using local regression over word time se- tag may have an effect as regards the topicality of ries. In a related study, Tops et al. (2013) use sup- the tweet. Hashtags that occur in final position (i.e. port vector machines to classify the time to event they are tweet-final or are only followed by one in automatically discretized categories. At best or more other hashtags) are typically metatags and these studies are about a day off in their predic- therefore possibly more reliable as topic identifiers tions. Both studies investigate the use of temporal than tweet non-final hashtags which behave more expressions, but fail to leverage the utility of this like common content words in context. In order information source, most likely because they use to be able to investigate the possible effect that the limited sets of less than 20 regular expressions. In position of the hashtag might have, we split our this study we scale up the number of temporal ex- data in the following two subsets: pressions. FIN – comprising tweets in which the hashtag occurs in final position (as defined above); 3 Experimental Set-Up 84,533 tweets. We carried out a controlled case study in which we NFI – comprising tweets in which the hashtag oc- focused on Dutch premier league soccer matches curs in non-final position; 53,608 tweets. as a type of scheduled event. These types of games have the advantage that they occur frequently, Each tweet in our data set has a time stamp of have a distinctive hashtag by convention, and often the moment (in seconds) it was posted. Moreover, generate thousands to several tens of thousands of for each soccer match we know exactly when it tweets per match. took place. This information is used to calculate Below we first describe the collection and com- for each tweet the actual time that remains to the position of our data sets (Subsection 3.1) and the start of the event and the absolute error in estimat- temporal expressions which were used to base our ing the time to event. predictions upon (Subsection 3.2). Then, in Sub- 3.2 Temporal Expressions section 3.3, we describe our baselines and evalua- In the context of this paper temporal expressions tion method. are considered to be words or phrases which point 3.1 Data Sets to the point in time, the duration, or the frequency of an event. These may be exact, approximate, or We harvested tweets from twiqs.nl6 , a database even right out vague. Although in our current ex- of Dutch tweets collected from December 2010 periment we restrict ourselves to an eight-day pe- onwards. We selected the six best performing riod prior to an event, we chose to create a gross teams of the Dutch premier league in 2011 and list of all possible temporal expressions we could 20127 , and queried all matches in which these think of, so that we would not run the risk of over- teams played against each other in the calendar looking any items and the list can be used on fu- years 2011 and 2012. The collection procedure re- ture occasions even when the experimental set- sulted in 269,999 tweets referring to 60 individual ting is different. Thus the list also includes tem- matches. The number of tweets per event ranges poral expressions that refer to points in time out- from 321 to 35,464, with a median of 2,723 tweets. side the time span under investigation here, such 6 8 http://twiqs.nl An analysis of the tweet distribution shows that the eight- 7 Ajax, Feyenoord, PSV, FC Twente, AZ Alkmaar, and FC day window captures about 98% of the tweets in the larger Utrecht. data set from which it was derived. 10 as gisteren ‘yesterday’ or over een maand ‘in a ‘yet’ or ‘another’. As we have presently no way month from now’, and items indicating duration of distinguishing between the different senses and or frequency such as steeds ‘continuously’/‘time these items have at best an extremely vague tem- and again’. No attempt has been made to distin- poral sense so that they cannot be expected to con- guish between items as regards time reference (fu- tribute to estimating the time to event, we deciced ture time, past time) as many items can be used in to discard these.10 both fashions (compare for example vanmiddag in In order to capture event targeted expressions, vanmiddag ga ik naar de wedstrijd ‘this afternoon we treated domain terms such as wedstrijd ‘soc- I’m going to the match’ vs ik ben vanmiddag naar cer match’ as parts of temporal expressions in case de wedstrijd geweest ‘I went to the match this af- they co-occur with a temporal expression. ternoon’. For the items on the list no provisions were The list is quite comprehensive. Among the made for handling any kind of spelling variation, items included are single words, e.g. adverbs with the single exception of a small group of such as nu ‘now’, zometeen ‘immediately’, straks words (including ’s morgens ‘in the morning’, ’s ‘later on’, vanavond ‘this evening’, nouns such as middags ‘in the afternoon’ and ’s avonds ‘in the zondagmiddag ‘Sunday afternoon’, and conjunc- evening’) which use in their standard spelling the tions such as voordat ‘before’), but also word com- archaic ’s and abbreviations. As many authors binations and phrases such as komende woensdag of tweets tend to spell these words as smorgens, ‘next Wednesday. Temporal expressions of the lat- smiddags and savonds we decided to include these ter type were obtained by means of a set of 615 forms as well. seed terms and 70 rules, which generated a total of The items on the list that were obtained through around 53,000 temporal expressions. In addition, generation include temporal expressions such as there are a couple of hundred thousand temporal over 3 dagen ‘in 2 days’, nog 5 minuten ‘another expressions relating the number of minutes, hours, 5 minutes’, but also fixed temporal expressions days, or time of day;9 they include items contain- such as clock times.11 The rules handle frequently ing up to 9 words in a single temporal expression. observed variations in their notation, for example Notwithstanding the impressive number of items drie uur ‘three o’clock’ may be written in full or included, the list is bound to be incomplete. as 3:00, 3:00 uur, 3 u, 15.00, etc. We included prepositional phrases rather than Table 1 shows example temporal expression es- single prepositions so as to avoid generating too timates and applicable rules. The median estima- much noise. Many prepositions have several uses: tions are mostly lower than the mean estimations. they can be used to express time, but also for The distribution of the time to event (TTE) for example location. Compare voor in voor drie a single temporal expression often appears to be uur ‘before three o’clock’ and voor het stadion skewed towards lower values. The final column ‘in front of the stadium’. Moreover, prepositions of the table displays the applicable rules. The first are easily confused with parts of separable verbs six rules subtract the time the tweet was posted which in Dutch are abundant. (TT) from an average marker point, heuristically Various items on the list are inherently ambigu- determined, such as ‘today 20.00’ (i.e. 8 pm) for ous and only in one of their senses can be con- vanavond ‘tonight’. The second and third rules sidered temporal expressions. Examples are week from below state a TTE directly, again heuristi- ‘week’ but also ‘weak’ and dag ‘day’ but also cally set – over 2 uur ‘in 2 hours’ is directly trans- ‘goodbye’. For items like these, we found that lated to a TTE of 2. the different senses could fairly easily be distin- guished whenever the item was immediately pre- 3.3 Evaluation and Baselines ceded by an adjective such as komende and vol- Our approach to TTE estimation makes use of gende (both meaning ‘next’). For a few highly all temporal expressions in our temporal expres- frequent items this proved impossible. These are sion list that are found to occur in the tweets. A words like zo which can be either a temporal ad- 10 Note that nog does occur on the list as part of various verb (‘in a minute’; cf. zometeen) or an intensi- multiword expressions. Examples are nog twee dagen ‘an- fying adverb (‘so’), dan ‘then’ or ‘than’, and nog other two days’ and nog 10 min ‘10 more minutes’. 11 Dates are presently not covered by our rules but will be 9 For examples see Table 1 and Section 3.3. added in future. 11 Temporal Expression Gloss Mean TTE Median TTE Rule vandaag today 5.63 3.09 today 15:00 - TT h vanavond tonight 8.40 4.78 today 20:00 - TT h morgen tomorrow 20.35 18.54 tomorrow 15:00 - TT h zondag Sunday 72.99 67.85 Sunday 15:00 - TT h vandaag 12.30 today 12.20 2.90 2.75 today 12:30 - TT h om 16.30 at 16.30 1.28 1.36 today 16:30 - TT h over 2 uur in 2 hours 6.78 1.97 2h nog minder dan 1 u within 1 h 21.43 0.88 1h in het weekend during the weekend 90.58 91.70 No Rule Table 1: Examples of temporal expressions and their mean and median TTE estimation from training data. The final column lists the applicable rule, if any. Rules make use of the time of posting (Tweet Time, TT). match may be for a single item in the list (e.g. to be the same as the N minutes, days or zondag ‘Sunday’) or any combination of items whatever is mentioned. The rules take prece- (e.g. zondagmiddag, om 14.30 uur, ‘Sunday af- dence over the mean estimates learned from ternoon’, ‘at 2.30 pm’). There can be other words the training set; in between these expressions. We consider the 3. A second set of rules, referred to as the Dy- longest match, from left to right, in case we en- namic rules, is used to calculate the TTE dy- counter any overlap. namically, using the temporal expression and The experiment adopts a leave-one-out cross- the tweet’s time stamp. These rules apply validation setup. Each iteration uses all tweets to instances such as zondagmiddag om 3 uur from 59 events as training data. All tweets from ‘Sunday afternoon at 3 p.m.’. Here we as- the single held-out event are used as test set. sume that this is a future time reference on the In the FIN data set there are 42,396 tweets with basis of the fact that the tweets were posted at least one temporal expression, in the NFI data prior to the event. With temporal expressions set this is the case for 27,610 tweets. The number that are underspecified in that they do not pro- of tweets per event ranges from 66 to 7,152 (me- vide a specific point in time (hour), we pos- dian: 402.5; mean 706.6) for the FIN data set and tulate a particular time of day. For exam- from 41 to 3,936 (median 258; mean 460.1) for the ple, vandaag ‘today’ is understood as ‘today NFI data set. at 3 p.m., vanavond ‘this evening’ as ’this We calculate the TTE estimations for every evening at 8 p.m. and morgenochtend ‘to- tweet that contains at least one of the temporal ex- morrow morning’ as ‘tomorrow morning at pression or a combination in the test set. The esti- 10 a.m.’. Again, as was the case with the first mations for the test set are obtained as follows: set of rules, these rules take precedence over the mean or median estimates learned from 1. For each match (a single temporal expression the training data. or a combination of temporal expressions) the mean or median value for TTE is used The results for the estimated TTE are evaluated that was learned from the training set; in terms of the absolute error, i.e. the absolute dif- 2. Temporal expressions that denote an exact ference in hours between the estimated TTE and amount of time are interpreted by means of the actual remaining time to the event. rules that we henceforth refer to as Exact We established two naive baselines: the mean rules. This applies for example to tempo- and median TTE measured over all tweets of FIN ral expressions answering to patterns such as and NFI datasets. These baselines reflect a best over N {minuut | minuten | kwartier | uur | guess when no information is available other than uren | dag | dagen | week} ‘in N {minute | tweet count and TTE of each tweet. The mean minutes | quarter of an hour | hour | hours | TTE is 22.82 hours, and the median TTE is 3.63 day | days | week}’. Here the TTE is assumed hours before an event. The low values of the 12 baselines, especially the low median, reveal the sion). In contrast to Table 2, in which only a mild skewedness of the data: most tweets referring to difference could be observed between the median a soccer event are posted in the hours before the and mean variants of training, the figure shows a event. substantial difference. The estimations of the me- dian training variant are considerably more accu- 4 Results rate than the mean variant up to 24 hours before the event, after which the mean variant scores bet- Table 2 lists the overall mean absolute error (in ter. By virtue of the fact that the data is skewed number of hours) for the different variants. The (most tweets are posted within a few hours before results are reported separately for each of the two the event) the two methods attain a similar overall data sets (FIN and NFI) and for both sets aggre- mean absolute error, but it is clear that the median gated (FIN+NFI). For each of these three variants, variant produces considerably more accurate pre- the table lists the mean absolute error when only dictions when the event is still more than a day the basic data-driven TTE estimations are used away. (‘Basic’), when the Exact rules are added (‘+Ex.’), While Figure 1 provides insight into the ef- when the Dynamic rules are added (‘+Dyn’), and fect of median versus mean-based training with when both types of rules are added. The coverage the combined FIN+NFI dataset, we do not know of the combination (i.e. the number of tweets that whether training on either of the two subsets is match the expressions and the rules) is listed in the advantageous at different points in time. Table 3 bottom row of the table. shows the mean absolute error of systems trained A number of observations can be made. First, with the median variant on the two subsets of all training methods perform substantially better tweets, FIN and NFI, as well as the combination than the two baselines in all conditions. Second, FIN+NFI, split into nine time ranges. Interest- the TTE training method using the median as esti- ingly, the combination does not produce the lowest mation produces estimations that are about 1 hour errors close to the event. However, when the event more accurate than the mean-based estimations. is 24 hours away or more, both the FIN and NFI Third, adding Dynamic rules has a larger pos- systems generate increasingly large errors, while itive effect on prediction error than adding Ex- the FIN+NFI system continues to make quite ac- act rules. The bottom row in the table indicates curate predictions, remaining under 10 hours off that the rules do not increase the coverage of the even for the longest TTEs, confirming what we al- method substantially. When taken together and ready observed in Figure 1. added to the basic TTE estimation, the Dynamic and Exact rules do improve over the Basic estima- TTE range (h) FIN NFI FIN+NFI tion by two to three hours. 0 2.58 3.07 8.51 Finally, although the differences are small, Ta- 1–4 2.38 2.64 8.71 ble 2 reveals that training on hashtag-final tweets 5–8 3.02 3.08 8.94 (FIN) produces slightly better overall results (7.62 9–12 5.20 5.47 6.57 hours off at best) than training on hashtag-non- 13–24 5.63 5.54 6.09 final tweets (8.50 hours off) or the combination 25–48 13.14 15.59 5.81 (7.99 hours off), despite the fact that the training 49–96 17.20 20.72 6.93 set is smaller than that of the combination. 97–144 30.38 41.18 6.97 In the remainder of this section we report on > 144 55.45 70.08 9.41 systems that use all expressions and Exact and Dy- Table 3: Mean Absolute Error for the FIN, NFI, namic rules. and FIN+NFI systems in different TTE ranges. Whereas Table 2 displays the overall mean ab- solute errors of the different variants, Figure 1 dis- plays the results in terms of mean absolute error at 5 Analysis different points in time before the event, averaged over periods of one hour, for the two baselines and One of the results observed in Table 2 was the the FIN+NFI variant with the two training meth- relatively limited role of Exact rules, which were ods (i.e. taking the mean versus the median of the intended to deal with exact temporal expressions observed TTEs for a particular temporal expres- such as nog 5 minuten ‘5 more minutes’ and over 13 System FIN NFI FIN+NFI Basic +Ex. +Dyn. +Both Basic +Ex. +Dyn. +Both Basic +Ex. +Dyn. +Both Baseline Median 21.09 21.07 21.16 21.14 18.67 18.72 18.79 18.84 20.20 20.20 20.27 20.27 Baseline Mean 27.29 27.29 27.31 27.31 25.49 25.50 25.53 25.55 26.61 26.60 26.63 26.62 Training Median 10.38 10.28 7.68 7.62 11.09 11.04 8.65 8.50 10.61 10.54 8.03 7.99 Training Mean 11.62 11.12 8.73 8.29 12.43 11.99 9.53 9.16 11.95 11.50 9.16 8.76 Coverage 31,221 31,723 32.240 32,740 18,848 19,176 19,734 20,061 52,186 52,919 53,887 54,617 Table 2: Overall Mean Absolute Error for each method: difference in hours between the estimated time to event and the actual time to event, computed separately for the FIN and NFI subsets, and for the combination. For all variants a count of the number of matches is listed in the bottom row. een uur ‘in one hour’. This can be explained by tendency to use underspecified temporal expres- the fact that as long as the temporal expression is sions as the event is still some time away. Thus, related to the event we are targeting, the point in rather than volgende week zondag om 14.30 uur time is denoted exactly by the temporal expression ‘next week Sunday at 2.30 p.m.’ just volgende and the estimation obtained from the training data week is used, which makes it harder to estimate (the ‘Basic’ performance) will already be accurate, the time to event. leaving no room for the rules to improve on this. Closer inspection of some of the temporal The rules that deal with dynamic temporal expres- expressions which yielded large absolute errors sions, on the other hand, have quite some impact. suggests that these may be items that refer to As was explained in Section 3.2 our list of tem- subevents rather than the main event (i.e. the poral expressions was a gross list, including items match) we are targeting. Examples are eerst ‘first’, that were unlikely to occur in our present data. In daarna ‘then’, vervolgens ‘next’, and voordat ‘be- all we observed 770 of the 53,000 items listed, fore’. 955 clock time rule matches, and 764 time ex- pressions which contain number of days, hours, 6 Conclusions and Future Work minutes etc. The temporal expressions observed most frequently in our data are:12 vandaag ‘today’ We have presented a method for the estimation of (10,037), zondag ‘Sunday’ (6,840), vanavond the TTE from single tweets referring to a future ‘tonight’ (5167), straks ‘later on’ (5,108), van- event. In a case study with Dutch soccer matches, middag ‘this afternoon’ (4,331), matchday ‘match we showed that estimations can be as accurate as day’ (2,803), volgende week ‘next week’ (1,480) about eight hours off, averaged over a time win- and zometeen ‘in a minute’ (1,405). dow of eight days. There is some variance in the 60 events on which we tested in a leave-one- Given the skewed distribution of tweets over the out validation setup: errors ranged between 4 and eight days prior to the event, it is not surprising to 13 hours, plus one exceptionally badly predicted find that nearly all of the most frequent items refer event with a 34-hour error. to points in time within close range of the event. Apart from nu ‘now’, all of these are somewhat The best system is able to stay within 10 hours vague about the exact point in time. There are, of prediction error in the full eight-day window. however, numerous items such as om 12:30 uur This best system uses a large set of hand-designed ‘at half past one’ and over ongeveer 45 minuten temporal expressions that in a training phase have ‘in about 45 minutes’) which are very specific and each been linked to a median TTE with which therefore tend to appear with middle to low fre- they occur in a training set. Together with these quencies.13 And while it is possible to state an data-driven TTE estimates, the system uses a set exact point in time even when the event is in the of rules that match on exact and indirect time ref- more distant future, we find that there is a clear erences. In a comparative experiment we showed that this combination worked better than only hav- 12 The observed frequencies can be found between brack- ing the data-driven estimations. ets. We then tested whether it was more profitable 13 While an expression such as om 12:30 uur has a fre- quency of 116, nog maar 8 uur en 35 minuten ‘only 8 hours to train on tweets that had the event hashtag at the and 35 minutes from now’ has a frequency of 1. end, as this is presumed to be more likely a meta- 14 Figure 1: Curves showing the absolute error (in hours) in estimating the time to event over an 8-day period (-192 to 0 hours) prior to the event. The two baselines are compared to the TTE estimation methods using the mean and median variant. tag, and thus a more reliable clue that the tweet from our current study, or would we need to re- is about the event than when the hashtag is not train per event type? in final position. Indeed we find that the overall Second, we hardcoded a limited number of fre- predictions are more accurate, but only in the fi- quent spelling variations, where it would be a nal hours before the event (when most tweets are more generic solution to rely on a more system- posted). 24 hours and earlier before the event it atic spelling normalization preprocessing step. turns out to be better to train both on hashtag-final Third, so far we did not focus on determining and hashtag-non-final tweets. the relevance of temporal expressions in case there Finally, we observed that the two variants of are several time expressions in a single message; our method of estimating TTEs for single tempo- we treated all occurred temporal expressions as ral expressions, taking the mean or the median, equally contributing to the estimation. Identifying leads to dramatically different results, especially which temporal expressions are relevant in a sin- when the event is still a few days away—when gle message is studied by Kanhabua et al. (2012). an accurate time to event is actually desirable. Finally, our method is limited to temporal ex- The median-based estimations, which are gener- pressions. For estimating the time to event on ally smaller than the mean-based estimations, lead the basis of tweets that do not contain tempo- to a system that largely stays under 10 hours of ral expressions, we could benefit from term-based error. approaches that consider any word or word n- gram as potentially predictive (Hürriyetoglu et al., Our study has a number of logical extensions 2013). into future research. First, our method is not bound to a single type of event, although we tested Acknowledgment it in a controlled setting. With experiments on tweet streams related to different types of events This research was supported by the Dutch na- the general applicability of the method could be tional programme COMMIT as part of the Infiniti tested: can we use the trained TTE estimations project. 15 References Alan Ritter, Oren Etzioni Mausam, and Sam Clark. 2012. Open domain event extraction from twitter. Ricardo Baeza Yates. 2005. Searching the future. In In Proceedings of the 18th ACM SIGKDD Inter- In ACM SIGIR Workshop on Mathematical/Formal national Conference on Knowledge Discovery and Methods for Information Retrieval (MF/IR 2005). Data mining, KDD ’12, pages 1104–1112, New York, NY, USA. ACM. Hila Becker, Dan Iter, Mor Naaman, and Luis Gravano. 2012. Identifying content for planned events across Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. social media sites. In Proceedings of the fifth ACM 2010. Earthquake shakes twitter users: Real-time International Conference on Web Search and Data event detection by social sensors. In Proceedings Mining, WSDM ’12, pages 533–542, New York, of the 19th International Conference on World Wide NY, USA. ACM. Web, pages 851–860. ACM. Gaël Dias, Ricardo Campos, and Alı́pio Jorge. 2011. Jannik Strötgen and Michael Gertz. 2013. Multilin- Future retrieval: What does the future talk about? In gual and Cross-domain Temporal Tagging. Lan- In Proceedings SIGIR2011 Workshop on Enriching guage Resources and Evaluation, 47(2):269–298, Information Retrieval (ENIR2011). Jun. Ali Hürriyetoglu, Florian Kunneman, and Antal Hannah Tops, Antal van den Bosch, and Florian Kun- van den Bosch. 2013. Estimating the time between neman. 2013. Predicting time-to-event from twitter twitter messages and future events. In DIR, pages messages. BNAIC 2013 The 24th Benelux Confer- 20–23. ence on Artificial Intelligence, pages 207–2014. Adam Jatowt and Ching-man Au Yeung. 2011. Ex- Wouter Weerkamp and Maarten De Rijke. 2012. Ac- tracting collective expectations about the future from tivity prediction: A twitter-based exploration. In large text collections. In Proceedings of the 20th Proceedings of the SIGIR 2012 Workshop on Time- ACM International Conference on Information and aware Information Access, TAIA-2012, August. Knowledge Management, CIKM ’11, pages 1259– 1264, New York, NY, USA. ACM. Andrea Zielinski, Ulrich Bügel, L. Middleton, S. E. Middleton, L. Tokarchuk, K. Watson, and F. Chaves. Nattiya Kanhabua, Sara Romano, and Avaré Stewart. 2012. Multilingual analysis of twitter news in sup- 2012. Identifying relevant temporal expressions for port of mass emergency events. In A. Abbasi and real-world events. In Proceedings of The SIGIR N. Giesen, editors, EGU General Assembly Confer- 2012 Workshop on Time-aware Information Access, ence Abstracts, volume 14 of EGU General Assem- Portland, OR. bly Conference Abstracts, pages 8085+, April. Hideki Kawai, Adam Jatowt, Katsumi Tanaka, Kazuo Kunieda, and Keiji Yamada. 2010. Chronoseeker: Search engine for future and past events. In Pro- ceedings of the 4th International Conference on Uniquitous Information Management and Commu- nication, ICUIMC ’10, pages 25:1–25:10, New York, NY, USA. ACM. Florian A Kunneman and Antal van den Bosch. 2012. Leveraging unscheduled event prediction through mining scheduled event tweets. BNAIC 2012 The 24th Benelux Conference on Artificial Intelligence, page 147. Taichi Noro, Takashi Inui, Hiroya Takamura, and Man- abu Okumura. 2006. Time period identification of events in text. In Proceedings of the 21st Interna- tional Conference on Computational Linguistics and the 44th annual meeting of the Association for Com- putational Linguistics, ACL-44, pages 1153–1160, Stroudsburg, PA, USA. Association for Computa- tional Linguistics. Kira Radinsky, Sagie Davidovich, and Shaul Markovitch. 2012. Learning causality for news events prediction. In Proceedings of the 21st International Conference on World Wide Web, WWW ’12, pages 909–918, New York, NY, USA. ACM. 16
US