US

Andrea Ballatore - King's College London

https://kcl.academia.edu/AndreaBallatore Archived on 2026-04-25 09:46 UTC

Andrea Ballatore - King's College London

Skip to main content

Andrea Ballatore

King's College London, Department of Digital Humanities, Faculty Member

Birkbeck College, University of London, Geography, Faculty Member

Birkbeck College, University of London, Department for Geography, Environment and Development, Faculty Member

University of California, Santa Barbara, Center for Spatial Studies, Research Affiliate

Followers

304

Following

155

Co-authors

14

Public Views

I am a (geographic) data scientist interested in cultural geo-analytics.

less

InterestsView All (10)

Uploads

Journal Articles by Andrea Ballatore

Tracking museums' online responses to the Covid-19 pandemic: a study in museum analytics

by Andrea Ballatore and Valeri Katerinchuk

ACM Journal on Computing and Cultural Heritage, 2023

The COVID-19 pandemic led to the temporary closure of all museums in the UK, closing buildings an... more The COVID-19 pandemic led to the temporary closure of all museums in the UK, closing buildings and suspending all on-site activities. Museum agencies aim at mitigating and managing these impacts on the sector, in a context of chronic data scarcity. "Museums in the Pandemic" is an interdisciplinary project that utilises content scraped from museums' websites and social media posts in order to understand how the UK museum sector, currently comprising over 3,300 museums, has responded and is currently responding to the pandemic. A major part of the project has been the design of computational techniques to provide the project's museum studies experts with appropriate data and tools for undertaking this research, leveraging web analytics, natural language processing, and machine learning. In this methodological contribution, firstly, we developed techniques to retrieve and identify museum official websites and social media accounts (Facebook and Twitter). This supported the automated capture of large-scale online data about the entire UK museum sector. Secondly, we harnessed convolutional neural networks to extract activity indicators from unstructured text in order to detect museum behaviours, including openings, closures, fundraising, and staffing. This dynamic dataset is enabling the museum studies experts in the team to study patterns in the online presence of museums before, during, and after the pandemic, according to museum size, governance, accreditation, and location 1. CCS Concepts: • Applied computing → Arts and humanities; • Information systems → Digital libraries and archives; • Computing methodologies → Knowledge representation and reasoning.

A geography of UK museums

Transactions of the Institute of British Geographers, 2023

Museums are important centres of heritage, culture, education, and tourism. These diverse institu... more Museums are important centres of heritage, culture, education, and tourism. These diverse institutions operate in different ways, reaching different audiences and managing varied collections. Thanks to a novel database of unprecedented completeness produced by the project, this study provides a quantitative geography of museums in the UK, showing how about half of the sector had not been surveyed before. The presence of museums is mapped across several attributes, including museum size (estimated as yearly visits) and governance (government‐led, independent, or university‐led). Firstly, observing a snapshot of the sector in December 2017, we quantify and interpret the spatial distribution of museums, discussing its implications for access to museums, public service provision, resource allocation, and cultural tourism. Then, in a regional analysis, we study their density in relation to the local population, at the regional and Local Authority District scale, providing new evidence of the extent of spatial inequalities in the cultural sector, particularly relevant to a sector in which funding is mostly allocated at the regional level. At the crossing between human geography and museum studies, this inquiry reveals the centres and peripheries of this cultural sphere, providing fresh evidence of the presences and absences that shape cultural life across the UK.

Los Angeles as a digital place: The geographies of user-generated content

by Andrea Ballatore and Stef De Sabbata

Transactions in GIS, 2019

Online representations of places are becoming pivotal in informing our understanding of urban lif... more Online representations of places are becoming pivotal in informing our understanding of urban life. Content production on online platforms is grounded in the geography of their users and their digital infrastructure. These constraints shape place representation, that is the amount, quality, and type of digital information available in a geographic area. In this article, we study the place representation of user-generated content (UGC) in Los Angeles County, relating the spatial distribution of the data to its geo-demographic context. Adopting a comparative and multiplatform approach, this quantitative analysis investigates the spatial relationship between four diverse UGC datasets and their context at the census tract level (about 685,000 geo-located tweets, 9,700 Wikipedia pages, 4M OSM objects, and 180,000 Foursquare venues). The context includes the ethnicity, age, income, education, and deprivation of residents, as well as public infrastructure. An exploratory spatial analysis and regression-based models indicate that the four UGC platforms possess distinct geographies of place representation. To a moderate extent, the presence of Twitter, OpenStreetMap, and Foursquare data is influenced by population density, ethnicity, education, and income. However, each platform responds to different socio-economic factors and clusters emerge in disparate hotspots. Unexpectedly, Twitter data tends to be located in more dense, deprived areas, and the geography of Wikipedia appears peculiar and harder to explain. These trends are compared with previous findings for the area of Greater London.

format_quoteThe study compares UGC sources against socio-economic contexts, revealing significant biases in geographic data coverage in LA.format_quote

Creating a Knowledge Base to research the history of UK Museums through Rapid Application Development

by Andrea Ballatore and Fiona Candlin

ACM Journal on Computing and Cultural Heritage, 2019

Several studies have highlighted the absence of an integrated comprehensive dataset covering all ... more Several studies have highlighted the absence of an integrated comprehensive dataset covering all of the UK’s museums, hence impeding research into the emergence, evolution and wider impact of the UK’s museums sector. “Mapping Museums” is an interdisciplinary project aiming to develop a comprehensive database of UK museums in existence since 1960, and to use this to undertake an evidence-based analysis of the development of the UK’s museum sector during 1960-2020 and the links to wider cultural, social, and political concerns. A major part of the project has been the iterative, participatory design of a new RDF/S Knowledge Base to store data and metadata relating to the UK’s museums, and a Web Application for the project’s humanities scholars to browse, search and visualise the data in order to investigate their research questions. This paper presents the challenges we faced in developing the Knowledge Base and Web Application, our methodology and methods, the design and implementation of the system, and the design, outcomes and implications of a user trial undertaken with a group of experts from the UK’s museums sector.

The Missing Museums: Accreditation, surveys, and an alternative account of the UK sector

Cultural Trends, 2019

Surveys of the UK museum sector have all had subtly different remits and so represent the sector ... more Surveys of the UK museum sector have all had subtly different remits and so represent the sector in a variety of ways. In the last three decades, surveys have almost invariably focused on accredited institutions, thereby omitting almost half of the museums in the UK. In this article we examine how data collection became tied to the accreditation scheme, and its effects on how the museum sector was and is represented as a professionalised sphere. Yet, while is important to understand the role of surveys in constructing the museum sector, this article goes beyond critique to show how the inclusion of unaccredited museums drastically changes the profile of the museum sector. We outline the inclusive approach that the Mapping Museums project team has taken with regards to data collection, and compare our findings with those that are produced when a survey is limited to accredited museums. In so doing, we sketch out an alternative, heterogeneous version of the UK museum sector and make recommendations based on that evidence.

Placing Wikimapia: An exploratory analysis

International Journal of Geographical Information Science, 2018

Wikimapia is a major privately-owned volunteered geographic information (VGI) project to collect ... more Wikimapia is a major privately-owned volunteered geographic information (VGI) project to collect information about places. Over the past ten years, Wikimapia has attracted hundreds of thousands of contributors and collected millions of data points, including towns, restaurants, lakes, and tourist attractions (http://wikimapia.org). Unlike OpenStreetMap, Wikimapia adopts a "placial" perspective, favouring rich descriptions over detailed geometries and encouraging the collection of textual and visual content about places with approximate footprints. In this article, we first trace the origin and development of Wikimapia as a for-profit project, intimately linked with search engine advertising. Drawing on an in-depth interview with a former developer, we analyse project's data model and characteristics of its community. As Wikimapia discussions are rife with copyright issues, we discuss the project's intellectual property, as well as its strategies for quality management. Second, we focus on the popularity of the project, which is crucial to the longevity and sustainability of VGI projects. Using behavioural data from Google Trends, we trace a geography of interest in Wikimapia, comparing with that in OpenStreetMap, from a temporal and spatial perspective. While OpenStreetMap attracts more interest in high-income countries, Wikimapia emerges as relatively more popular in low-and middle-income countries, countering the received notion of VGI as a Global North phenomenon. Our study suggests that Wikimapia's popularity is steadily declining.

Digital Hegemonies: The Localness of Search Engine Results

Every day, billions of Internet users rely on search engines to find information about places to ... more Every day, billions of Internet users rely on search engines to find information about places to make decisions about tourism, shopping, and countless other economic activities. In an opaque process, search engines assemble digital content produced in a variety of locations around the world and make it available to large cohorts of consumers. Although these representations of place are increasingly important and consequential, little is known about their characteristics and possible biases. Analyzing a corpus of Google search results generated for 188 capital cities, this article investigates the geographic dimension of search results, focusing on searches such as “Lagos” and “Rome” on different localized versions of the engine. This study answers these questions: To what degree is this city-related information locally produced and diverse? Which countries are producing their own representations and which are represented by others? Through a new indicator of localness of search results, we identify the factors that contribute to shape this uneven digital geography, combining several development indicators. The development of the publishing industry and scientific production appears as a fairly strong predictor of localness of results. This empirical knowledge will support efforts to curb the digital divide, promoting a more inclusive, democratic information society.

Personalizing Maps

Geographic maps constitute a ubiquitous medium through which we understand, construct, and naviga... more Geographic maps constitute a ubiquitous medium through which we understand, construct, and navigate our natural and built surroundings. At the intersection of the explosion of geographic information online, data-mining techniques, and the increasing popularity of Web maps, a novel possibility has emerged: Instead of generating one map for large numbers of users, user pro ling and implicit feedback analysis can support creation of a different map for each person. The automated personalization of the map-making process is still in its infancy but has the potential to provide more relevant maps to millions of users worldwide.

Conceptualising the geographic world: The dimensions of negotiation in crowdsourced cartography

International Journal of Geographical Information Science, 2015

In crowdsourced cartographic projects, mappers coordinate their efforts through online tools to p... more In crowdsourced cartographic projects, mappers coordinate their efforts through online tools to produce digital geospatial artefacts, such as maps and gazetteers, which were once the exclusive territory of professional surveyors and cartographers. In order to produce meaningful and coherent data, contributors need to negotiate a shared conceptualisation that defines the domain concepts, such as road, building, train station, forest, and lake, enabling the communication of geographic knowledge. Considering the OpenStreetMap Wiki website as a case study, this article investigates the nature of this negotiation, driven by a small group of mappers in a context of high contribution inequality. Despite the apparent consensus on the conceptualisation, the negotiation keeps unfolding in a tension between alternative representations, which are often incommensurable, i.e., hard to integrate and reconcile. In this study, we identify six complementary dimensions of incommensurability that recur in the negotiation: (i) ontology, (ii) cartography, (iii) culture and language, (iv) lexical definitions, (v) granularity, and (vi) semantic overload and duplication.

format_quoteAnalyzed negotiation types on OSM Wiki, finding terms related to mapping attracted significant incommensurability.format_quote

Google chemtrails: A methodology to analyze topic representation in search engine results

First Monday, Jul 2015

Search engine results influence the visibility of different viewpoints in political, cultural, an... more Search engine results influence the visibility of different viewpoints in political, cultural, and scientific debates. Treating search engines as editorial products with intrinsic biases can help understand the structure of information flows in new media. This paper outlines an empirical methodology to analyze the representation of topics in search engines, reducing the spatial and temporal biases in the results. As a case study, the methodology is applied to 15 popular conspiracy theories, examining type of content and ideological bias, demonstrating how this approach can inform debates in this field, specifically in relation to the representation of non?mainstream positions, the suppression of controversies and relativism.

format_quoteDespite reducing biases, the methodology acknowledges limitations due to the changing nature of search engine results over time.format_quote

A Structural-Lexical Measure of Semantic Similarity for Geo-Knowledge Graphs

by Andrea Ballatore and Michela Bertolotto

Graphs have become ubiquitous structures to encode geographic knowledge online. The Semantic Web’... more Graphs have become ubiquitous structures to encode geographic knowledge online. The Semantic Web’s linked open data, folksonomies, wiki websites and open gazetteers can be seen as geo-knowledge graphs, that is labeled graphs whose vertices represent geographic concepts and whose edges encode the relations between concepts. To compute the semantic similarity of concepts in such structures, this article defines the network-lexical similarity measure (NLS). This measure estimates similarity by combining two complementary sources of information: the network similarity of vertices and the semantic similarity of the lexical definitions. NLS is evaluated on the OpenStreetMap Semantic Network, a crowdsourced geo-knowledge graph that describes geographic concepts. The hybrid approach outperforms both network and lexical measures, obtaining very strong correlation with the similarity judgments of human subjects.

format_quoteNLS is empirically validated using a real-world GKG, OpenStreetMap, containing 5000 concepts and facilitating in-depth evaluation.format_quote

The myth of the Digital Earth between fragmentation and wholeness

Daring predictions of the proximate future can establish shared discursive frameworks, mobilize c... more Daring predictions of the proximate future can establish shared discursive frameworks, mobilize capital, and steer complex processes. Among the prophetic visions that encouraged and accompanied the development of new communication technologies was the “Digital Earth,” described in a 1998 speech by Al Gore as a high-resolution representation of the planet to share and analyze detailed information about its state. This article traces a genealogy of the Digital Earth as a techno-scientific myth, locating it in a constellation of media futures, arguing that a common subtext of these envisionments consists of a dream of wholeness, an afflatus to overcome perceived fragmentation among humans, and between humans and the Earth.

format_quoteGore proposed a 'Digital Earth program' to generate a new global climate model leveraging vast data for enhanced climate predictions.format_quote

The Semantic Similarity Ensemble

Journal of Spatial Information Science, 2013

Computational measures of semantic similarity between geographic terms pro- vide valuable support... more Computational measures of semantic similarity between geographic terms pro- vide valuable support across geographic information retrieval, data mining, and information integration. To date, a wide variety of approaches to geo-semantic similarity have been devised. A judgment of similarity is not intrinsically right or wrong, but obtains a certain degree of cognitive plausibility, depending on how closely it mimics human behavior. Thus selecting the most appropriate measure for a specific task is a significant challenge. To address this issue, we make an analogy between computational similarity measures and soliciting domain expert opinions, which incorporate a subjective set of beliefs, perceptions, hypotheses, and epistemic biases. Following this analogy, we define the semantic similarity ensemble (SSE) as a composition of different similarity measures, acting as a panel of experts having to reach a decision on the semantic similarity of a set of geographic terms. The ap- proach is evaluated in comparison to human judgments, and results indicate that an SSE performs better than the average of its parts. Although the best member tends to outperform the ensemble, all ensembles outperform the average performance of each ensemble’s member. Hence, in contexts where the best measure is unknown, the ensemble provides a more cognitively plausible approach.

format_quoteThe empirical evidence suggests that the SSE can be beneficial when selecting a semantic similarity measure under uncertainty.format_quote

Defacing the map: Cartographic vandalism in the digital commons

This article addresses the emergent phenomenon of carto-vandalism, the intentional defacement of ... more This article addresses the emergent phenomenon of carto-vandalism, the intentional defacement of collaborative cartographic digital artefacts in the context of volunteered geographic information. Through a qualitative analysis of reported incidents in WikiMapia and OpenStreetMap, a typology of this kind of vandalism is outlined, including play, ideological, fantasy, artistic and industrial carto-vandalism, as well as carto-spam. Two families of counter-strategies deployed in amateur mapping communities are discussed. First, the contributors organize forms of policing, based on volunteered community involvement, patrolling the maps and reporting incidents. Second, the detection of carto-vandalism can be supported by automated tools, based either on explicit rules or on machine learning.

Linking Geographic Vocabularies through WordNet

Annals of GIS, 2014

The linked open data (LOD) paradigm has emerged as a promising approach to structuring and sharin... more The linked open data (LOD) paradigm has emerged as a promising approach to structuring and sharing geospatial information. One of the major obstacles to this vision lies in the difficulties found in the automatic integration between heterogeneous vocabularies and ontologies that provides the semantic backbone of the growing constellation of open geo-knowledge bases. In this article, we show how to utilize WordNet as a semantic hub to increase the integration of LOD. With this purpose in mind, we devise Voc2WordNet, an unsupervised mapping technique between a given vocabulary and WordNet, combining intensional and extensional aspects of the geographic terms. Voc2WordNet is evaluated against a sample of human-generated alignments with the OpenStreetMap (OSM) Semantic Network, a crowdsourced geospatial resource, and the GeoNames ontology, the vocabulary of a large digital gazetteer. These empirical results indicate that the approach can obtain high precision and recall.

An evaluative baseline for geo-semantic relatedness and similarity

GeoInformatica, Jan 2014

In geographic information science and semantics, the computation of semantic similarity is widely... more In geographic information science and semantics, the computation of semantic similarity is widely recognised as key to supporting a vast number of tasks in information integration and retrieval. By contrast, the role of geo-semantic relatedness has been largely ignored. In natural language processing, semantic relatedness is often confused with the more specific semantic similarity. In this article, we discuss a notion of geo-semantic relatedness based on Lehrer’s semantic fields, and we compare it with geo-semantic similarity. We then describe and validate the Geo Relatedness and Similarity Dataset (GeReSiD), a new open dataset designed to evaluate computational measures of geo-semantic relatedness and similarity. This dataset is larger than existing datasets of this kind, and includes 97 geographic terms combined into 50 term pairs rated by 203 human subjects. GeReSiD is available online and can be used as an evaluation baseline to determine empirically to what degree a given computational model approximates geo-semantic relatedness and similarity.

The web will kill them all: new media, digital utopia, and political struggle in the Italian 5-Star Movement

Media, Culture & Society, Jan 2014

This article examines the role of discourses about new media technology and the web in the rise o... more This article examines the role of discourses about new media technology and the web in the rise of the 5-Star Movement (Movimento 5 Stelle, or M5S) in Italy. Founded by comedian and activist Beppe Grillo and web entrepreneur Gianroberto Casaleggio in 2009, this movement succeeded in becoming the second largest party at the 2013 national elections in Italy. This article aims to discuss how elements of digital utopia and web-centric discourses have been inserted into the movement’s political message, and how the construction of the web as a myth has shaped the movement’s discourse and political practice. The 5-Star Movement is compared and contrasted with other social and political movements in western countries which have displayed a similar emphasis on new media, such as the Occupy movement, the Indignados movement, and the Pirate Parties in Sweden and Germany. By adopting and mutating cyber-utopian discourses from the so-called Californian ideology, the movement symbolically identifies itself with the web. The traditional political establishment is associated with “old” media (television, radio, and the printed press), and represented as a “walking dead,” doomed to be superseded and buried by a web-based direct democracy.

Computing the semantic similarity of geographic terms using volunteered lexical deﬁnitions

International Journal of Geographical Information Science, Sep 2013

Volunteered geographic information (VGI) is generated by heterogenous ‘information communities’ t... more Volunteered geographic information (VGI) is generated by heterogenous ‘information communities’ that co-operate to produce reusable units of geographic knowledge. A consensual lexicon is a key factor to enable this open production model. Lexical definitions help demarcate the boundaries of terms, forming a thin semantic ground on which knowledge can travel. In VGI, lexical definitions often appear to be inconsistent, circular, noisy and highly idiosyncratic. Computing the semantic similarity of these ‘volunteered lexical definitions’ has a wide range of applications in GIScience, including information retrieval, data mining and information integration. This article describes a knowledge-based approach to quantify the semantic similarity of lexical definitions. Grounded in the recursive intuition that similar terms are described using similar terms, the approach relies on paraphrase-detection techniques and the lexical database WordNet. The cognitive plausibility of the approach is evaluated in the context of the OpenStreetMap (OSM) Semantic Network, obtaining high correlation with human judgements. Guidelines are provided for the practical usage of the approach.

Good location, terrible food: detecting feature sentiment in user-generated reviews

by Andrea Ballatore, Marie-aude Aufaure, and Ilaria Tiddi

Social Network Analysis and Mining, 2013

A growing corpus of online informal reviews is generated every day by non-experts, on social netw... more A growing corpus of online informal reviews is generated every day by non-experts, on social networks and blogs, about an unlimited range of products and services. Users do not only express holistic opinions, but often focus on specific features of their interest. The automatic understanding of “what people think” at the feature level can greatly support decision making, both for consumers and producers. In this paper, we present an approach to feature-level sentiment detection that integrates natural language processing with statistical techniques, in order to extract users’ opinions about specific features of products and services from user-generated reviews. First, we extract domain features, and each review is modelled as a lexical dependency graph. Second, for each review, we estimate the polarity relative to the features by leveraging the syntactic dependencies between the terms. The approach is evaluated against a ground truth consisting of set of user-generated reviews, manually annotated by 39 human subjects and available online, showing its human-like ability to capture feature-level opinions.

format_quoteFleiss' kappa score of .65 indicated high inter-rater agreement on polarity judgments in user evaluations.format_quote

Geographic knowledge extraction and semantic similarity in OpenStreetMap

In recent years, a web phenomenon known as Volunteered Geographic Information (VGI) has produced ... more In recent years, a web phenomenon known as Volunteered Geographic Information (VGI) has produced large crowdsourced geographic data sets. OpenStreetMap (OSM), the leading VGI project, aims at building an open-content world map through user contributions. OSM semantics consists of a set of properties (called ‘tags’) describing geographic classes, whose usage is defined by project contributors on a dedicated Wiki website. Because of its simple and open semantic structure, the OSM approach often results in noisy and ambiguous data, limiting its usability for analysis in information retrieval, recommender systems and data mining. Devising a mechanism for computing the semantic similarity of the OSM geographic classes can help alleviate this semantic gap. The contribution of this paper is twofold. It consists of (1) the development of the OSM Semantic Network by means of a web crawler tailored to the OSM Wiki website; this semantic network can be used to compute semantic similarity through co-citation measures, providing a novel semantic tool for OSM and GIS communities; (2) a study of the cognitive plausibility (i.e. the ability to replicate human judgement) of co-citation algorithms when applied to the computation of semantic similarity of geographic concepts. Empirical evidence supports the usage of co-citation algorithms—SimRank showing the highest plausibility—to compute concept similarity in a crowdsourced semantic network.

Same domain → Similar titles →