Journal Articles by Andrea Ballatore

ACM Journal on Computing and Cultural Heritage, 2023
The COVID-19 pandemic led to the temporary closure of all museums in the UK, closing buildings an... more The COVID-19 pandemic led to the temporary closure of all museums in the UK, closing buildings and suspending all on-site activities. Museum agencies aim at mitigating and managing these impacts on the sector, in a context of chronic data scarcity. "Museums in the Pandemic" is an interdisciplinary project that utilises content scraped from museums' websites and social media posts in order to understand how the UK museum sector, currently comprising over 3,300 museums, has responded and is currently responding to the pandemic. A major part of the project has been the design of computational techniques to provide the project's museum studies experts with appropriate data and tools for undertaking this research, leveraging web analytics, natural language processing, and machine learning. In this methodological contribution, firstly, we developed techniques to retrieve and identify museum official websites and social media accounts (Facebook and Twitter). This supported the automated capture of large-scale online data about the entire UK museum sector. Secondly, we harnessed convolutional neural networks to extract activity indicators from unstructured text in order to detect museum behaviours, including openings, closures, fundraising, and staffing. This dynamic dataset is enabling the museum studies experts in the team to study patterns in the online presence of museums before, during, and after the pandemic, according to museum size, governance, accreditation, and location 1. CCS Concepts: • Applied computing → Arts and humanities; • Information systems → Digital libraries and archives; • Computing methodologies → Knowledge representation and reasoning.

Transactions of the Institute of British Geographers, 2023
Museums are important centres of heritage, culture, education, and tourism. These diverse institu... more Museums are important centres of heritage, culture, education, and tourism. These diverse institutions operate in different ways, reaching different audiences and managing varied collections. Thanks to a novel database of unprecedented completeness produced by the project, this study provides a quantitative geography of museums in the UK, showing how about half of the sector had not been surveyed before. The presence of museums is mapped across several attributes, including museum size (estimated as yearly visits) and governance (government‐led, independent, or university‐led). Firstly, observing a snapshot of the sector in December 2017, we quantify and interpret the spatial distribution of museums, discussing its implications for access to museums, public service provision, resource allocation, and cultural tourism. Then, in a regional analysis, we study their density in relation to the local population, at the regional and Local Authority District scale, providing new evidence of the extent of spatial inequalities in the cultural sector, particularly relevant to a sector in which funding is mostly allocated at the regional level. At the crossing between human geography and museum studies, this inquiry reveals the centres and peripheries of this cultural sphere, providing fresh evidence of the presences and absences that shape cultural life across the UK.

Transactions in GIS, 2019
Online representations of places are becoming pivotal in informing our understanding of urban lif... more Online representations of places are becoming pivotal in informing our understanding of urban life. Content production on online platforms is grounded in the geography of their users and their digital infrastructure. These constraints shape place representation, that is the amount, quality, and type of digital information available in a geographic area. In this article, we study the place representation of user-generated content (UGC) in Los Angeles County, relating the spatial distribution of the data to its geo-demographic context. Adopting a comparative and multiplatform approach, this quantitative analysis investigates the spatial relationship between four diverse UGC datasets and their context at the census tract level (about 685,000 geo-located tweets, 9,700 Wikipedia pages, 4M OSM objects, and 180,000 Foursquare venues). The context includes the ethnicity, age, income, education, and deprivation of residents, as well as public infrastructure. An exploratory spatial analysis and regression-based models indicate that the four UGC platforms possess distinct geographies of place representation. To a moderate extent, the presence of Twitter, OpenStreetMap, and Foursquare data is influenced by population density, ethnicity, education, and income. However, each platform responds to different socio-economic factors and clusters emerge in disparate hotspots. Unexpectedly, Twitter data tends to be located in more dense, deprived areas, and the geography of Wikipedia appears peculiar and harder to explain. These trends are compared with previous findings for the area of Greater London.

ACM Journal on Computing and Cultural Heritage, 2019
Several studies have highlighted the absence of an integrated comprehensive dataset covering all ... more Several studies have highlighted the absence of an integrated comprehensive dataset covering all of the UK’s museums, hence impeding research into the emergence, evolution and wider impact of the UK’s museums sector. “Mapping Museums” is an interdisciplinary project aiming to develop a comprehensive database of UK museums in existence since 1960, and to use this to undertake an evidence-based analysis of the development of the UK’s museum sector during 1960-2020 and the links to wider cultural, social, and political concerns. A major part of the project has been the iterative, participatory design of a new RDF/S Knowledge Base to store data and metadata relating to the UK’s museums, and a Web Application for the project’s humanities scholars to browse, search and visualise the data in order to investigate their research questions. This paper presents the challenges we faced in developing the Knowledge Base and Web Application, our methodology and methods, the design and implementation of the system, and the design, outcomes and implications of a user trial undertaken with a group of experts from the UK’s museums sector.

Cultural Trends, 2019
Surveys of the UK museum sector have all had subtly different remits and so represent the sector ... more Surveys of the UK museum sector have all had subtly different remits and so represent the sector in a variety of ways. In the last three decades, surveys have almost invariably focused on accredited institutions, thereby omitting almost half of the museums in the UK. In this article we examine how data collection became tied to the accreditation scheme, and its effects on how the museum sector was and is represented as a professionalised sphere. Yet, while is important to understand the role of surveys in constructing the museum sector, this article goes beyond critique to show how the inclusion of unaccredited museums drastically changes the profile of the museum sector. We outline the inclusive approach that the Mapping Museums project team has taken with regards to data collection, and compare our findings with those that are produced when a survey is limited to accredited museums. In so doing, we sketch out an alternative, heterogeneous version of the UK museum sector and make recommendations based on that evidence.

International Journal of Geographical Information Science, 2018
Wikimapia is a major privately-owned volunteered geographic information (VGI) project to collect ... more Wikimapia is a major privately-owned volunteered geographic information (VGI) project to collect information about places. Over the past ten years, Wikimapia has attracted hundreds of thousands of contributors and collected millions of data points, including towns, restaurants, lakes, and tourist attractions (http://wikimapia.org). Unlike OpenStreetMap, Wikimapia adopts a "placial" perspective, favouring rich descriptions over detailed geometries and encouraging the collection of textual and visual content about places with approximate footprints. In this article, we first trace the origin and development of Wikimapia as a for-profit project, intimately linked with search engine advertising. Drawing on an in-depth interview with a former developer, we analyse project's data model and characteristics of its community. As Wikimapia discussions are rife with copyright issues, we discuss the project's intellectual property, as well as its strategies for quality management. Second, we focus on the popularity of the project, which is crucial to the longevity and sustainability of VGI projects. Using behavioural data from Google Trends, we trace a geography of interest in Wikimapia, comparing with that in OpenStreetMap, from a temporal and spatial perspective. While OpenStreetMap attracts more interest in high-income countries, Wikimapia emerges as relatively more popular in low-and middle-income countries, countering the received notion of VGI as a Global North phenomenon. Our study suggests that Wikimapia's popularity is steadily declining.

Every day, billions of Internet users rely on search engines to find information about places to ... more Every day, billions of Internet users rely on search engines to find information about places to make decisions about tourism, shopping, and countless other economic activities. In an opaque process, search engines assemble digital content produced in a variety of locations around the world and make it available to large cohorts of consumers. Although these representations of place are increasingly important and consequential, little is known about their characteristics and possible biases. Analyzing a corpus of Google search results generated for 188 capital cities, this article investigates the geographic dimension of search results, focusing on searches such as “Lagos” and “Rome” on different localized versions of the engine. This study answers these questions: To what degree is this city-related information locally produced and diverse? Which countries are producing their own representations and which are represented by others? Through a new indicator of localness of search results, we identify the factors that contribute to shape this uneven digital geography, combining several development indicators. The development of the publishing industry and scientific production appears as a fairly strong predictor of localness of results. This empirical knowledge will support efforts to curb the digital divide, promoting a more inclusive, democratic information society.
Geographic maps constitute a ubiquitous medium through which we understand, construct, and naviga... more Geographic maps constitute a ubiquitous medium through which we understand, construct, and navigate our natural and built surroundings. At the intersection of the explosion of geographic information online, data-mining techniques, and the increasing popularity of Web maps, a novel possibility has emerged: Instead of generating one map for large numbers of users, user pro ling and implicit feedback analysis can support creation of a different map for each person. The automated personalization of the map-making process is still in its infancy but has the potential to provide more relevant maps to millions of users worldwide.

International Journal of Geographical Information Science, 2015
In crowdsourced cartographic projects, mappers coordinate their efforts through online tools to p... more In crowdsourced cartographic projects, mappers coordinate their efforts through online tools to produce digital geospatial artefacts, such as maps and gazetteers, which were once the exclusive territory of professional surveyors and cartographers. In order to produce meaningful and coherent data, contributors need to negotiate a shared conceptualisation that defines the domain concepts, such as road, building, train station, forest, and lake, enabling the communication of geographic knowledge. Considering the OpenStreetMap Wiki website as a case study, this article investigates the nature of this negotiation, driven by a small group of mappers in a context of high contribution inequality. Despite the apparent consensus on the conceptualisation, the negotiation keeps unfolding in a tension between alternative representations, which are often incommensurable, i.e., hard to integrate and reconcile. In this study, we identify six complementary dimensions of incommensurability that recur in the negotiation: (i) ontology, (ii) cartography, (iii) culture and language, (iv) lexical definitions, (v) granularity, and (vi) semantic overload and duplication.
First Monday, Jul 2015
Search engine results influence the visibility of different viewpoints in political, cultural, an... more Search engine results influence the visibility of different viewpoints in political, cultural, and scientific debates. Treating search engines as editorial products with intrinsic biases can help understand the structure of information flows in new media. This paper outlines an empirical methodology to analyze the representation of topics in search engines, reducing the spatial and temporal biases in the results. As a case study, the methodology is applied to 15 popular conspiracy theories, examining type of content and ideological bias, demonstrating how this approach can inform debates in this field, specifically in relation to the representation of non?mainstream positions, the suppression of controversies and relativism.

Graphs have become ubiquitous structures to encode geographic knowledge online. The Semantic Web’... more Graphs have become ubiquitous structures to encode geographic knowledge online. The Semantic Web’s linked open data, folksonomies, wiki websites and open gazetteers can be seen as geo-knowledge graphs, that is labeled graphs whose vertices represent geographic concepts and whose edges encode the relations between concepts. To compute the semantic similarity of concepts in such structures, this article defines the network-lexical similarity measure (NLS). This measure estimates similarity by combining two complementary sources of information: the network similarity of vertices and the semantic similarity of the lexical definitions. NLS is evaluated on the OpenStreetMap Semantic Network, a crowdsourced geo-knowledge graph that describes geographic concepts. The hybrid approach outperforms both network and lexical measures, obtaining very strong correlation with the similarity judgments of human subjects.
Daring predictions of the proximate future can establish shared discursive frameworks, mobilize c... more Daring predictions of the proximate future can establish shared discursive frameworks, mobilize capital, and steer complex processes. Among the prophetic visions that encouraged and accompanied the development of new communication technologies was the “Digital Earth,” described in a 1998 speech by Al Gore as a high-resolution representation of the planet to share and analyze detailed information about its state. This article traces a genealogy of the Digital Earth as a techno-scientific myth, locating it in a constellation of media futures, arguing that a common subtext of these envisionments consists of a dream of wholeness, an afflatus to overcome perceived fragmentation among humans, and between humans and the Earth.

Journal of Spatial Information Science, 2013
Computational measures of semantic similarity between geographic terms pro- vide valuable support... more Computational measures of semantic similarity between geographic terms pro- vide valuable support across geographic information retrieval, data mining, and information integration. To date, a wide variety of approaches to geo-semantic similarity have been devised. A judgment of similarity is not intrinsically right or wrong, but obtains a certain degree of cognitive plausibility, depending on how closely it mimics human behavior. Thus selecting the most appropriate measure for a specific task is a significant challenge. To address this issue, we make an analogy between computational similarity measures and soliciting domain expert opinions, which incorporate a subjective set of beliefs, perceptions, hypotheses, and epistemic biases. Following this analogy, we define the semantic similarity ensemble (SSE) as a composition of different similarity measures, acting as a panel of experts having to reach a decision on the semantic similarity of a set of geographic terms. The ap- proach is evaluated in comparison to human judgments, and results indicate that an SSE performs better than the average of its parts. Although the best member tends to outperform the ensemble, all ensembles outperform the average performance of each ensemble’s member. Hence, in contexts where the best measure is unknown, the ensemble provides a more cognitively plausible approach.
This article addresses the emergent phenomenon of carto-vandalism, the intentional defacement of ... more This article addresses the emergent phenomenon of carto-vandalism, the intentional defacement of collaborative cartographic digital artefacts in the context of volunteered geographic information. Through a qualitative analysis of reported incidents in WikiMapia and OpenStreetMap, a typology of this kind of vandalism is outlined, including play, ideological, fantasy, artistic and industrial carto-vandalism, as well as carto-spam. Two families of counter-strategies deployed in amateur mapping communities are discussed. First, the contributors organize forms of policing, based on volunteered community involvement, patrolling the maps and reporting incidents. Second, the detection of carto-vandalism can be supported by automated tools, based either on explicit rules or on machine learning.

Annals of GIS, 2014
The linked open data (LOD) paradigm has emerged as a promising approach to structuring and sharin... more The linked open data (LOD) paradigm has emerged as a promising approach to structuring and sharing geospatial information. One of the major obstacles to this vision lies in the difficulties found in the automatic integration between heterogeneous vocabularies and ontologies that provides the semantic backbone of the growing constellation of open geo-knowledge bases. In this article, we show how to utilize WordNet as a semantic hub to increase the integration of LOD. With this purpose in mind, we devise Voc2WordNet, an unsupervised mapping technique between a given vocabulary and WordNet, combining intensional and extensional aspects of the geographic terms. Voc2WordNet is evaluated against a sample of human-generated alignments with the OpenStreetMap (OSM) Semantic Network, a crowdsourced geospatial resource, and the GeoNames ontology, the vocabulary of a large digital gazetteer. These empirical results indicate that the approach can obtain high precision and recall.

GeoInformatica, Jan 2014
In geographic information science and semantics, the computation of semantic similarity is widely... more In geographic information science and semantics, the computation of semantic similarity is widely recognised as key to supporting a vast number of tasks in information integration and retrieval. By contrast, the role of geo-semantic relatedness has been largely ignored. In natural language processing, semantic relatedness is often confused with the more specific semantic similarity. In this article, we discuss a notion of geo-semantic relatedness based on Lehrer’s semantic fields, and we compare it with geo-semantic similarity. We then describe and validate the Geo Relatedness and Similarity Dataset (GeReSiD), a new open dataset designed to evaluate computational measures of geo-semantic relatedness and similarity. This dataset is larger than existing datasets of this kind, and includes 97 geographic terms combined into 50 term pairs rated by 203 human subjects. GeReSiD is available online and can be used as an evaluation baseline to determine empirically to what degree a given computational model approximates geo-semantic relatedness and similarity.

Media, Culture & Society, Jan 2014
This article examines the role of discourses about new media technology and the web in the rise o... more This article examines the role of discourses about new media technology and the web in the rise of the 5-Star Movement (Movimento 5 Stelle, or M5S) in Italy. Founded by comedian and activist Beppe Grillo and web entrepreneur Gianroberto Casaleggio in 2009, this movement succeeded in becoming the second largest party at the 2013 national elections in Italy. This article aims to discuss how elements of digital utopia and web-centric discourses have been inserted into the movement’s political message, and how the construction of the web as a myth has shaped the movement’s discourse and political practice. The 5-Star Movement is compared and contrasted with other social and political movements in western countries which have displayed a similar emphasis on new media, such as the Occupy movement, the Indignados movement, and the Pirate Parties in Sweden and Germany. By adopting and mutating cyber-utopian discourses from the so-called Californian ideology, the movement symbolically identifies itself with the web. The traditional political establishment is associated with “old” media (television, radio, and the printed press), and represented as a “walking dead,” doomed to be superseded and buried by a web-based direct democracy.

International Journal of Geographical Information Science, Sep 2013
Volunteered geographic information (VGI) is generated by heterogenous ‘information communities’ t... more Volunteered geographic information (VGI) is generated by heterogenous ‘information communities’ that co-operate to produce reusable units of geographic knowledge. A consensual lexicon is a key factor to enable this open production model. Lexical definitions help demarcate the boundaries of terms, forming a thin semantic ground on which knowledge can travel. In VGI, lexical definitions often appear to be inconsistent, circular, noisy and highly idiosyncratic. Computing the semantic similarity of these ‘volunteered lexical definitions’ has a wide range of applications in GIScience, including information retrieval, data mining and information integration. This article describes a knowledge-based approach to quantify the semantic similarity of lexical definitions. Grounded in the recursive intuition that similar terms are described using similar terms, the approach relies on paraphrase-detection techniques and the lexical database WordNet. The cognitive plausibility of the approach is evaluated in the context of the OpenStreetMap (OSM) Semantic Network, obtaining high correlation with human judgements. Guidelines are provided for the practical usage of the approach.

Social Network Analysis and Mining, 2013
A growing corpus of online informal reviews is generated every day by non-experts, on social netw... more A growing corpus of online informal reviews is generated every day by non-experts, on social networks and blogs, about an unlimited range of products and services. Users do not only express holistic opinions, but often focus on specific features of their interest. The automatic understanding of “what people think” at the feature level can greatly support decision making, both for consumers and producers. In this paper, we present an approach to feature-level sentiment detection that integrates natural language processing with statistical techniques, in order to extract users’ opinions about specific features of products and services from user-generated reviews. First, we extract domain features, and each review is modelled as a lexical dependency graph. Second, for each review, we estimate the polarity relative to the features by leveraging the syntactic dependencies between the terms. The approach is evaluated against a ground truth consisting of set of user-generated reviews, manually annotated by 39 human subjects and available online, showing its human-like ability to capture feature-level opinions.

In recent years, a web phenomenon known as Volunteered Geographic Information (VGI) has produced ... more In recent years, a web phenomenon known as Volunteered Geographic Information (VGI) has produced large crowdsourced geographic data sets. OpenStreetMap (OSM), the leading VGI project, aims at building an open-content world map through user contributions. OSM semantics consists of a set of properties (called ‘tags’) describing geographic classes, whose usage is defined by project contributors on a dedicated Wiki website. Because of its simple and open semantic structure, the OSM approach often results in noisy and ambiguous data, limiting its usability for analysis in information retrieval, recommender systems and data mining. Devising a mechanism for computing the semantic similarity of the OSM geographic classes can help alleviate this semantic gap. The contribution of this paper is twofold. It consists of (1) the development of the OSM Semantic Network by means of a web crawler tailored to the OSM Wiki website; this semantic network can be used to compute semantic similarity through co-citation measures, providing a novel semantic tool for OSM and GIS communities; (2) a study of the cognitive plausibility (i.e. the ability to replicate human judgement) of co-citation algorithms when applied to the computation of semantic similarity of geographic concepts. Empirical evidence supports the usage of co-citation algorithms—SimRank showing the highest plausibility—to compute concept similarity in a crowdsourced semantic network.
Uploads
Journal Articles by Andrea Ballatore