RESEARCH ARTICLE

Leveraging Wikidata to Build Scholarly Profiles as Service

Mairelys Lemus-Rojas
Brown University

Jere Odell
IUPUI University Library

Lucille Frances Brys
IUPUI University Library

Mirian Ramirez Rojas
Ruth Lilly Medical Library, Indiana University School of Medicine

In this article, the authors share the different methods and tools utilized for supporting the Scholarly Profiles as Service (SPaS) model at Indiana University–Purdue University Indianapolis (IUPUI). Leveraging Wikidata to build a scholarly profile service aligns with interests in supporting open knowledge and provides opportunities to address information inequities. The article accounts for the authors' decision to focus first on profiles for women scholars at the university and provides a detailed case study of how these profiles are created. By describing the processes of delivering the service, the authors hope to inspire other academic libraries to work toward establishing stronger open data connections between academic institutions, their scholars, and their scholars' publications.

Keywords: scholarly profiles; Wikidata; Scholia; linked data

 

How to cite this article: Lemus-Rojas, Mairelys, Jere Odell, Lucille Frances Brys, and Mirian Ramirez Rojas. 2022. Leveraging Wikidata to Build Scholarly Profiles as Service. KULA: Knowledge Creation, Dissemination, and Preservation Studies 6(3). https://doi.org/10.18357/kula.171

Submitted: 25 June 2021 Accepted: 28 April 2022 Published: 27 July 2022

Competing interests and funding: The authors have no competing interests to declare.

Copyright: @ 2022 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/.

 

Introduction

Contributing to projects that offer a community-driven solution to sharing knowledge freely and openly is at the core of the mission of many academic libraries. The University Library (UL) of the Indiana University–Purdue University Indianapolis (IUPUI) campus has been an early adopter of open-source platforms and efforts to advance open knowledge and is committed to making scholarship openly accessible to a broader community of users (IUPUI University Library 2019) through, in part, its commitment to open knowledge projects (IUPUI University Library Center for Digital Scholarship 2021). IUPUI faculty likewise demonstrated their commitment to open knowledge when the faculty council adopted an open access policy (OAP) in 2014 (IUPUI University Library n.d.). As a result of this policy, articles written by institution-affiliated scholars that fit the criteria for the open access policy are archived in ScholarWorks, the institutional repository. Since the implementation of the OAP, the library has been managing a yearly data collection of more than three thousand articles authored by university scholars; of these, roughly 70 percent are deposited into ScholarWorks (IUPUI University Library Center for Digital Scholarship 2020). The OAP work is fundamental in that it provides free access to more than thirteen thousand articles that might otherwise be behind a paywall while also increasing the visibility of their authors.

However, the infrastructure the library uses for sharing these resources has its limitations. The current version of the software used for the institutional repository (DSpace 5.6) is not configured to provide linked data profiles for authors. Therefore, the connections among authors, works, and institutions must be established in other platforms. Wikidata has the infrastructure in place to facilitate these connections and more. Using Wikidata—the structured linked data repository that serves all Wikimedia projects—and the tools it enables, such as Scholia—a web-based service that feeds from Wikidata—will enhance findability while also contributing to the dissemination of scholarship. Wikidata offers an opportunity for extending the reach of scholars’ works and connecting them with other research while also making the data available in a format that is easily accessible and reusable. The library leverages the power of Wikidata to build profiles for scholars that may be less visible or systematically underserved in other systems.

This article shares the methods, approach, and priorities that inform a Scholarly Profiles as Service (SPaS) model at Indiana University–Purdue University Indianapolis (IUPUI), a campus that includes 3,693 people with academic positions. The article describes the background for the development of the service, explains a decision to focus on gender equity, and provides a description of the process of delivering the service for all women faculty affiliated with IUPUI.

Project Background

Infrastructure and Tools for Scholarly Profiles

Academic librarians who support faculty and researchers in tracking and showing scholarly productivity data rely on different approaches. For example, one approach is to populate a university content management system with lists of publications and provide links to the curriculum vitae files of the affiliated academics. But these lists are outdated in most cases, and their structure does not allow enough flexibility for the data to be reused. Another approach is to use faculty directories, researcher and author profiles (e.g., researchers’ ORCID profiles and Google Scholar profiles), personal websites, online bibliographic databases (e.g., Scopus, PubMed, and Web of Science), and open access institutional repositories to retrieve and display the researchers’ scholarly outputs. However, these sources are seldom perfectly accurate or complete.

Other tools specifically designed for scholarly profiles may provide a more reliable alternative. These systems allow any user to register, collect, and evaluate the productivity of academics at the institutional level, tracking their works and the relationships among scholars with common interests as well as managing scholars’ identities. Academic institutions often rely on proprietary software and services to maintain researchers’ profiles and research funding (e.g., Pure by Elsevier, Pivot by ProQuest, and Activity Insight by Digital Measures) (“Comparison of Research Networking Tools and Research Profiling Systems” 2021). However, open-source tools to support this work are available and openly accessible, and their use aligns with UL’s commitment to advancing openness. These tools include ReCiter (“Wcmc-Its/ReCiter” 2021), VIVO, DSpace-CRIS, and Scholia. ReCiter, developed by the Weill Cornell Medical College, is an author disambiguation system that uses a machine learning process and available identity data to generate bibliographies. It maintains publication lists of scholars by extracting data from PubMed or Scopus (Johnson et al. 2014; Albert et al. 2021). VIVO and DSpace-CRIS are tools for displaying researcher profiles (Obeid et al. 2014; Mornati 2019). In contrast, Scholia, a free web-based application, builds researcher profiles from data retrieved from Wikidata (Nielsen, Mietchen, and Willighagen 2017). The rendering of these profiles is dynamic because the application queries the Wikidata Query Service and directly displays the most up-to-date information available for a particular subject.

Scholarly profile tools implemented at IUPUI include Scholars@IU and ORCID. Scholars@IU—a ProQuest Pivot product offered by the university’s Office for Research—provides public profiles for 10,440 faculty and academics. This profile system lists faculty and researchers by campus, school, department, division, or center and by specific expertise. It is also intended to be a tool for identifying potential collaborators and mentors throughout the Indiana University system (Indiana University n.d.). In addition, IUPUI recently became an institutional ORCID member (ORCID n.d.). The libraries and the Offices for Research, Graduate Studies, and Academic Affairs use ORCID in several systems and programs—including annual reviews, Scholars@IU, Wikidata, journal publishing, and consultations for scholarly profile management. Thus, ORCID is a crucial tool facilitating the customization, integration, and connection of IUPUI affiliates’ identifiers with other research networking tools and research profiling systems already implemented at the institution. However, ORCID adoption depends in part on the participation of individual authors. At the same time, some systems, like Scholars@IU, offer few editing opportunities to authors and the librarians that serve them. As an openly edited database, Wikidata provides an opportunity to consolidate disparate data elements and to supplement data profiles with information that is missing from proprietary systems.

Wikidata for Scholarly Profiles

Wikidata, a structured linked data knowledge base, is part of the Wikimedia ecosystem of free and openly accessible projects. It was conceived as a central repository to support all Wikimedia-related projects but has since grown beyond the limits of its original conception. Many galleries, libraries, archives, and museums (GLAMs) have been working on building capacity through community engagement within their institutions and beyond to advance open knowledge (Lemus-Rojas 2019; Allison-Cassin et al. 2019). The contributions made in Wikidata by these communities play a significant role in the pursuit of knowledge equity. A factor that has attracted users from around the world is the multilingual nature of Wikidata; while Wikipedia currently has 310 active language versions, there is only one instance of Wikidata, where all languages coexist in one central location (“List of Wikipedias” 2021). The multilingual capacity of the knowledge base has made it an ideal environment for collaboration among the growing global community of users—almost twenty-seven thousand so far (“Wikidata:Statistics” 2021). Wikidata users are able to create, edit, maintain, and use the linked open data while also taking part in the development of a growing ontology to accommodate the needs of the community. The knowledge base can even accommodate conflicting information, which can be further defined with qualifying statements and references to support the assertions being made.

Wikidata’s data model is matched to Resource Description Framework (RDF) triples, facilitating interoperability between Wikidata and external data sources (“Wikidata:Relation Between Properties in RDF and in Wikidata” 2017). These triples contain a subject, a predicate, and an object. By forming these statements or assertions to describe a particular concept, new relationships are made, which contribute to the expansion of the knowledge graph. Wikidata entities—the content of an entry—may refer to an item, a property, or a lexeme. These entities have their corresponding namespaces: items in the main namespace, properties in the property namespace, and lexemes in the lexeme namespace (“Help:Namespaces” 2022). The system also assigns unique identifiers composed of a letter (Q for items and P for properties) and numbers to all the entities in these namespaces. The QID and PID allow machines to easily read and understand the data, while the values linked to them are written in a form more easily understood by humans. For example, in Table 1, the item Q63470490 represents the work “Conscientious Women: The Dispositional Conditions of Institutional Treatment on Civic Involvement”; this item is linked to the item Q56486841, which represents the scholar Amanda Friesen, through the use of the P50 property, which is used to store authors’ names. In other words, Q63470490 (“Conscientious Women: The Dispositional Conditions of Institutional Treatment on Civic Involvement”) is a work by P50 (an author) known as Q56486841 (Amanda Friesen).

Table 1: Example of RDF triple in Wikidata
RDF triple Machine friendly Human friendly
Subject (Item) Q63470490 “Conscientious Women: The Dispositional Conditions of Institutional Treatment on Civic Involvement”
Predicate (Property) P50 author
Object (Value) Q56486841 Amanda Friesen

While Wikidata stores data on human knowledge broadly, the community has taken an interest in increasing the representation of bibliographic data with a focus on scholarly articles. Roughly, publications represent 43 percent of items (“Statistics” n.d.) in Wikidata and the majority of these are scholarly articles representing 31 percent of the items stored in the knowledge base (“Wikidata:Statistics” 2021). While there is a growing number of users contributing article data to the knowledge base either by using external tools or by manual editing, most of the article contributions are being made through bots—tools used for making automated contributions without the need for human intervention. Many of these contributions are the efforts of WikiCite, an initiative and a community that aims at building an open citation database in Wikidata. Wikidata currently surpasses 37 million entries for scholarly articles and 240 million citation links to these articles and other publications (Scholia n.d).

Challenges and Issues: Gender Equity in Wikimedia Projects

The gender inequities in both the content and the culture of Wikimedia sites have been the focus of journalism, scholarship, and specific efforts of the Wikimedia Foundation. In all cases, most of the attention has focused on gender inequities in Wikipedia. Recent stories have focused on gaps in coverage, a culture of harassment among Wikipedians, and programmatic efforts to address the problems. For example, The Atlantic was one of several news outlets to cover the prior omission of Donna Strickland, a scientist who won the Nobel Prize for Physics (Koren 2018). The following year, The Guardian reported on the efforts of the Wikimedia Foundation to increase the number of women editors of Wikipedia (Balch 2019). In that same year, The New York Times covered the harassment of cisgendered women and transgender editors of Wikipedia (Jacobs 2019). More recently, The Washington Post published a commentary on a Wikipedian’s efforts to address gender bias in the site’s coverage of political science topics. The author observed, among other biases, the infrequency of women authors of works cited by Wikipedia’s political science entries (Baltz 2021).

Scholarly studies of gender inequities have also focused on Wikipedia. These studies explore the gender distribution of Wikipedia contributors, biases in the character and the quantity of content about women, and the structural features of Wikipedia that contribute to these inequities (Bear and Collier 2016; Ford and Wajcman 2017; Graells-Garrido, Lalmas, and Menczer 2015; Lir 2019; Sun and Peng 2021; Wagner et al. 2015; Wagner et al. 2016). A 2018 Wikimedia Foundation-supported investigation of Wikimedia editors found that 90 percent of gender self-reporting contributors identified as male and only 9 percent identified as female (“Community Insights/2018 Report/Contributors” 2019). By 2020, this imbalance eased, but only by six percentage points, with women contributors making up 15 percent of all Wikimedians (“Community Insights/Community Insights 2021 Report” 2021). Although these reports rely on opt-in surveys and may, therefore, underestimate the number of women contributors, efforts to adjust the estimates for survey-response bias improved the gender balance by less than 5 percentage points (Hill and Shaw 2013). This gender imbalance in editing means that the interests of male editors are more likely to result in new contributions. Because predominantly men contribute to and edit Wikimedia content, their interests begin to form links between subjects in such a way that women—when included—are cast as tangential to men (Wagner et al. 2015; Wagner et al. 2016; Graells-Garrido, Lalmas, and Menczer 2015).

Research addressing gender inequities in Wikidata is less common, but recent work shows that the gender disparities in Wikipedia are also found in Wikidata. To this point, Wikidata itself can be used as a tool to assess the gender imbalance of its content (Pellissier Tanon and Suchanek 2019). For example, by querying Wikidata for instances of “human” with known-gender records, Klein and Konieczny (2015) found that instances were “84.4% male, 15.6% female, and ≈ 0.0001% nonbinary.” This research formed the foundation for an ongoing dashboard of gender representation in Wikimedia projects, Humaniki. Based on a Wikidata search completed on June 14, 2021, Humaniki (n.d.) reports that Wikidata instances for “human” with the property sex or gender (P21) remain overwhelming imbalanced, with 81.9 percent of records “male,” 18.1 percent “female,” and .05 percent “other genders.” Along with other content, Wikidata now includes records for more than forty million journal articles; about half of these articles have been linked to author records. Of the records for authors, only 15 percent have statements indicating the author’s sex or gender (P21) (Cobb 2020). Although many of the records for scholarly authors have yet to be described by the Wikidata property for sex or gender (P21), the overall trend on the site skews disproportionately “male,” much like in Wikipedia.

Case Study: Scholarly Profiles as Service (SPaS)

Context

Wikidata stores structured linked data, making them interoperable, and releases these data under a CC0 license, which enables their reuse by external tools and applications. One of these web-based applications is Scholia. Among other things, Scholia was developed to facilitate the exploration of scholarship contained in Wikidata (Nielsen, Mietchen, and Willighagen 2017). It is a freely accessible and open solution for generating scholarly profiles and facilitating integration with other web services supporting open infrastructure. By using and sharing open data, the Scholia service is in compliance with FAIR (Findable, Accessible, Interoperable, Reusable) principles. Scholia makes live SPARQL queries (SPARQL Protocol and RDF Query Language) against Wikidata to generate the profiles, which means that they always present Wikidata’s most up-to-date information. While other scholarly profile services may necessitate curation by individual scholars/researchers, in Scholia the data presented are curated by the global community of Wikidata users. When the efforts of the global community are supplemented by targeted institutional approaches (like those described here), Scholia profiles can have comparably complete coverage. The Scholia service not only generates the profiles to make it easier for users to access and analyze Wikidata’s data, but it also provides links to external tools that can be used to enhance the data of a particular section. For instance, the co-authors graph section in a researcher’s profile page can include a link for missing co-author items that may need curation (see Figure 1). Using these links to identify the missing pieces and making these enhancements in Wikidata will improve the rendering of the co-authors graph the next time the query is run. While displaying publicly accessible data about scholars and their works, the Scholia web service does not collect any private information from users.

Fig 1
Figure 1: Co-author graph generated for an IUPUI-affiliated women scholar showing a link to the curation page for the disambiguation and/or creation of co-author entries.

The concept for the development and implementation of the Scholarly Profiles as Service (SPaS) model at IUPUI emerged after a pilot project conducted by UL in the summer of 2017 (“Wikidata:WikiProject IUPUI University Library” 2021). This pilot afforded us the opportunity to explore the potential of Wikidata and Scholia as free and open solutions for scholarly profiles, for which the Lilly Family School of Philanthropy was chosen as a use case. The goal of the pilot was to have represented in Wikidata the core faculty from the selected school, their co-authors—regardless of institutional affiliations—and some of their publications (Lemus-Rojas and Odell 2018). Connecting works to authors is vital for expanding the open citation graph, which is why this task was also explored during the pilot phase. This required looking at the reference sections of the faculty publications to get the necessary information to create entries for the works and connect them to their authors. This pilot project provided the foundation for the SPaS model, discussed in the rest of this paper, which aims to provide an accurate representation of IUPUI-affiliated scholars and their publications in Wikidata. Given the fact that women are largely underrepresented across all Wikimedia projects, the first phase of the SPaS work has focused on helping bridge this prevalent gender divide in the knowledge base by prioritizing the creation of entities for IUPUI women scholars while also enhancing the citation graph for their works.

At IUPUI, academic positions include executive administration, tenured or tenure-track faculty and librarians, clinical faculty, lecturers, and research faculty. In 2020, the headcount of full-time academic positions at IUPUI totalled 3,693 people. In survey data reporting on self-identification, the Office of Institutional Research and Decision Support found that 1,626 of these 3,693 people identified as “female,” 44 percent of the faculty population. However, of the 1,338 tenured or tenure-track faculty and librarians, only 479 (36 percent) identified as “female” (IUPUI Institutional Research and Decision Support n.d.). In other words, on this campus, men are more likely to have job security and salaries that reflect promotion on the tenure track. Furthermore, these tenured and tenure-track male faculty historically have had and currently have an outsized influence in the direction of the university’s research and creative focus. IUPUI includes seventeen degree-offering schools and, with the exception of Nursing, most schools have been led by male deans (Gregory H. Mobley, Archives Specialist, email correspondence to authors, May 5, 2020). Currently, only six degree-offering schools are led by female deans.

The exact local impact of these gender inequities at IUPUI is beyond the scope of our efforts on this project and beyond the scope of this article. However, these inequities are not unique to IUPUI and, in fact, gender inequity is a widely discussed challenge that has troubled Wikimedia projects for many years. With these inequities and the biases that they perpetuate in mind, we decided to put IUPUI’s women scholars at the forefront of our efforts. We have done so by creating Wikidata entries for all of the women scholars in a school or department while saving the male scholars for later. This may seem, at the outset, a strange approach for a project that aims to eventually provide a full open data profile of IUPUI’s authors and their scholarly products. However, it is a first step in our efforts to center the work of women in the intellectual history of the campus. By creating entries for women scholars first, we are taking a small step to reverse the typical relationships and linkages between male and female scholars. If we were to start with senior male scholars, our efforts would replicate gender dynamics that have centered men in citation networks and Wikimedia projects.

While this work to enhance the visibility of IUPUI women scholars is happening in Wikidata, the reach goes beyond the ecosystem of Wikimedia projects. Wikidata is used by a number of organizations to support their products and research. For this reason, ensuring an accurate representation of women authors will ultimately contribute to having information about them and their works more readily available across different platforms and utilities. For instance, Wikidata’s data is being used by AI technologies, integrated into digital assistant tools such as Siri and Alexa (Simonite 2019; Kinsella 2019; Abellán 2019), and used by Google to generate knowledge graphs. As discussed elsewhere, our prior work with Scholia and the resulting SPaS model have been key to demonstrating the value of contributing to open platforms (Lemus-Rojas and Odell 2018). This work adds a new layer to the campus’s support for the dissemination of scholarship that crosses the boundaries of library systems and helps improve general information resources on various subjects across disciplines.

Process

At IUPUI, we have taken several approaches to gather, curate, and contribute scholarly profile data to Wikidata. These approaches have been modified and adapted to facilitate the work based on the individual needs for the dataset; in our case, while we have worked toward ensuring the representation of IUPUI women scholars in Wikidata and linking scholars to their works, the creation of new scholar entries has proven to be more impactful. Creating entries for scholars in the knowledge base enables other Wikimedia contributors to create content and link articles and other data to these entries.

Representing Women Scholars

Until early 2021, the SPaS work had been carried out by two librarians at UL who not only were making direct contributions to Wikidata to represent women scholars and their publications, but were also organizing and facilitating Wikidata edit-a-thons for library employees to participate in this effort. Building capacity to get others in the library involved with Wikidata-related projects has been at the forefront of our efforts to contribute to open projects. Our campus was one of the participating institutions in the Program for Cooperative Cataloging (PCC) Wikidata Pilot led by the Library of Congress. This pilot provided us with an opportunity to put out a call for volunteers and formalize a SPaS Working Group that attracted library employees from IUPUI University Library, Ruth Lilly Medical Library, and Ruth Lilly Law Library. The composition of the group was a mix of six librarians and two staff members from public services and other more technical areas.

At this time, all current IUPUI women scholars have been either added to the knowledge base or, in cases where they were already represented, the entries about them have been enhanced with additional data and references. To ensure the Wikidata representation of all IUPUI women scholars in assistant, associate, or full professor positions, the SPaS Working Group has taken a school-by-school approach, adding or revising entries for all tenured and tenure-track women scholars on campus one school at a time. We begin the SPaS work by accessing the institutional web pages for women scholars by school. To gather the data, we use functions in Google Sheets to scrape web pages or, in some cases, we collect the data manually. At a minimum, from the scholars’ institutional web pages we can retrieve their names, affiliations, and website links. We also gather their education history and any links available for their professional networking and social media websites.

Once the data have been compiled in Google Sheets, we use OpenRefine to clean up the data and run the Wikidata reconciliation service (“Wikidata Reconciliation for OpenRefine” n.d.). This reconciliation service is a quick way to determine if an author is already represented in Wikidata. In such cases, we can avoid duplication but still have an opportunity to use the data gathered to enhance the existing entries. In addition to reconciling the data in OpenRefine, we create a schema, which is essentially a template where the various edits needed for a particular Wikidata statement can be specified. For instance, the scholars’ names are used to label their Wikidata entity, the property affiliation (P1416) is used to connect the name of the school and/or unit they are affiliated with, and official website (P586) is used for the web page containing their scholarly data.

As part of the schema creation process, we also include a number of core properties and constant values for all entries (see Figure 2). These include the properties instance of (P31) to indicate that the entry is for a “human”; sex or gender (P21) with the value “female” because we are focusing on women scholars; languages spoken, written, or signed (P1412) with a default value of “English”; occupation (P106), which for faculty includes both values “university teacher” and “researcher”; employer (P108) to indicate they are employed by “Indiana University – Purdue University Indianapolis”; and work location (P937) with a default value of “Indianapolis.”

Fig 2
Figure 2: Schema created in OpenRefine containing core properties and constant values for the creation of entries to represent IUPUI scholars.

An important feature of Wikidata is that entities can be linked to external sources through the use of identifiers. This is not only beneficial in enriching the data in Wikidata but in making connections with other data sources. While it is not common to find identifiers on scholars’ web pages, some scholars do include links to their Twitter and LinkedIn profiles. If these links are provided, we record the information in Wikidata in the Twitter username (P2002) and LinkedIn personal profile ID (P6634) properties respectively. We also look for identifiers in other external web services to link them to the scholars’ entries—for instance, Google Scholar author ID (P1960), ORCID iD (P496), and Scopus author ID (P1153). Assertions in Wikidata can be supported by the inclusion of references. We use the property reference URL (P854) and the date the URL was retrieved (P813) to support our assertions. Having these data points set up in the schema allows us to either upload edits directly to Wikidata or export the data to QuickStatements—a tool used to make batch edits to Wikidata.

Once the items are in Wikidata, we often enhance the entries with additional information from a variety of sources. For instance, we look for education information in the scholar’s curriculum vitae (CV), if available through their institutional web page, as well as in their LinkedIn and ORCID profiles whenever possible. Any education data found for the scholar is recorded using the property educated at (P69) with the appropriate qualifiers to specify, for instance, the start time (P580), end time (P582), academic degree (P512), and academic major (P812). When CVs are available, it is often possible to add the employment information for the scholar prior to their time at IUPUI and include qualifiers to indicate the dates of employment. For instance, prior employment information is included under the existing employer (P108) statement added when creating or enhancing the scholar’s entry in Wikidata (see Figure 3). The new values to represent other employers can include qualifiers such as position held (P39), start time (P580), and end time (P582).

Fig 3
Figure 3: Example of an employer (P108) statement for an IUPUI-affiliated woman scholar containing prior employment information.

Typically, we used the scholar’s institutional web page as the main source of reference to support the statements. However, as we made progress in creating new Wikidata items to represent IUPUI women scholars and/or enhance existing entries, we were conscious of the fact that every year scholars retire or move to other institutions and their web pages are taken down. This meant that the information in a particular statement would no longer be verified through the reference we had used in Wikidata. Therefore, in an effort to ensure that users can still access inactive websites to verify the information in Wikidata, we initiated a project in the summer of 2020 to archive the official websites for all IUPUI women scholars using either the Internet Archive Wayback Machine or archive.today services. That way, the reference for the statement could include the archive URL (P1065) for the source and archive date (P2960) in addition to the reference URL (P854) and retrieved (P813) date (see Figure 4).

Fig 4
Figure 4: Example of an occupation (P106) statement for an IUPUI-affiliated woman scholar containing a supporting reference URL, retrieved date, archived URL, and archived date.

While SPaS is a more recent effort, UL has been engaging library personnel by organizing and hosting numerous Wikidata edit-a-thons to build capacity and enhance entries for IUPUI women scholars for several years (see Table 2). These events are reflective of UL’s academic open knowledge efforts and commitment to supporting Wikimedia campaigns and other projects that contribute to the collection of knowledge. The development and delivery of hands-on training sessions on how to use Wikidata has been a key element for the success of these initiatives. We endeavored to ensure that both the data that we were contributing and the participants in these events focused on addressing gender inequities. In 2017, the library organized its first Wikidata edit-a-thon, “Bringing IUPUI Female Faculty Members to Wikidata,” with five active editors enrolled who completed nearly 750 edits. In 2018, to honor International Women’s Day, the library organized another Wikidata edit-a-thon, “Wikidata for ‘Women Creating Excellence at IUPUI,’” which produced over a thousand edits. During the “Wiki Learning Event: Wikidata” program in January 2019, fourteen editors enrolled and recorded 131 edits to enhance existing entries for IUPUI-affiliated scholars. The success of the initial events hosted by UL inspired greater collaboration between UL and other campus libraries such as the Ruth Lilly Medical Library. For example, a one-hour Wikidata workshop hosted by UL later in January 2019, “Building Faculty Scholarly Profiles using Wikidata,” covered the basics of Wikidata and provided participants with the necessary skills to create and edit Wikidata entries for IUPUI women scholars with a focus on the Department of Obstetrics & Gynecology (“IUPUI University Library WikiProject Programs” n.d.; “Wikipedia:GLAM/IUPUI University Library/Events” 2021).

Table 2: Sample of Wikidata-related events hosted at UL focused on building capacity and contributing to the enhancement of entries representing IUPUI women scholars
Program/event name Date Number of editors Number of women editors Total edits
Bringing IUPUI Female Faculty Members to Wikidata November 9, 2017 5 3 744
Wikidata for “Women Creating Excellence at IUPUI” March 7, 2018 7 6 1,050
Wiki Learning Event: Wikidata January 3, 2019 14 8 131
Building Faculty Scholarly Profiles using Wikidata January 14, 2019 7 6 69

Linking Scholars to Their Articles

Knowing that there are bots making contributions of article entries to Wikidata, we have prioritized finding Wikidata article entries for works written by IUPUI women scholars and establishing connections between the works and author entries rather than adding new articles. Making these connections is critical when it comes to enhancing the citation graphs of IUPUI-affiliated scholars. The fact that this approach does not require making edits directly in Wikidata, but rather leans on the functionality of the Author Disambiguator tool, has afforded us the opportunity to increase participation from library personnel in support of the SPaS work.

One way in which this has been accomplished at UL is by continuing to organize and host edit-a-thons and editing competitions. In the first quarter of 2020, four events (see Table 3) were hosted at UL: “Wikidata Editing Competition,” “Wikidata Editing Competition 2,” “Linking IUPUI Women Faculty to Their Works in Wikidata,” and “Women Faculty Articles” (“IUPUI University Library WikiProject Programs” n.d.; “Wikipedia:GLAM/IUPUI University Library/Events” 2021). The objective of these events was to link existing profile entries of IUPUI-affiliated women scholars to their corresponding scholarly article entries already present in Wikidata.

Table 3: Sample of Wikidata-related events hosted at UL focused on linking IUPUI women scholars to their articles
Program/event name Date Number of editors Number of women editors Articles linked
Wikidata Editing Competition January 29, 2020 4 1 588
Wikidata Editing Competition 2 February 6, 2020 4 4 367
Linking IUPUI Women Faculty to Their Works in Wikidata March 5, 2020 13 7 2,254
Women Faculty Articles April 10, 2020 5 2 1,665

Of the various efforts to increase library participation, editing competitions have been the most productive. Participants of these events worked from a Google Sheet which provided a list of all of the IUPUI women scholars that were already present in Wikidata at the time of the event. Participants claimed the entry they wanted to work on by adding their Wiki username in a column next to the scholar’s name. Then, using the Author Disambiguator tool, they searched for the scholar’s name by copying a name from the spreadsheet and pasting it into the “author name” input area of the tool. Clicking on the “Look for author” button retrieved a list of potential publications for the scholar. Next, participants checked each publication to see if it had been authored by the IUPUI scholar or if the publication belonged to someone else. Those that did belong to an IUPUI-affiliated scholar were then linked using the “Link selected works to author” button. Once this process was complete, participants returned to the shared Google Sheet, where they recorded the number of articles that they were able to link to the chosen scholar. Turning this task into a competition by keeping track of who was able to link the most articles to the most scholars helped to turn what might have been seen as a tedious task into a fun and engaging one by providing bragging rights to the winner and a sense of accomplishment to all participants.

Other SPaS Efforts

Although we have prioritized creating author entries for IUPUI women scholars and linking those entries to existing entries for articles that those scholars have written, we have also completed work on specific topics and on all authors from selected schools. Specifically, we have made an effort to add or enhance Wikidata entries for all COVID-related works by IUPUI authors, ensure that the scholars were also represented, and link the works to the authors. In addition, we have contributed or enhanced records for all works authored in 2019 by all scholars from three campus schools: Education, Philanthropy, and Public Health. As a feature of IUPUI’s open access policy workflow, the UL manages a complete dataset of all articles authored by campus authors in any given year. If these works are scholarly articles, they meet the criteria for inclusion under the terms of the policy. All articles that are already open access or can be made open access in the institutional repository are deposited in the institutional repository, IUPUI ScholarWorks. The remaining articles are flagged and the IUPUI authors receive an email notification requesting that they send the library a version (typically the accepted manuscript) that can be openly archived. Because this work requires the library to maintain a complete and relatively clean metadata collection for works authored by IUPUI scholars, we have the opportunity to reuse the data in our SPaS effort. In completing this work for the three schools, we contributed or enhanced entries for 198 articles using external tools and utilities to automate the work (e.g., SourceMD, Zotero Wikidata Translator) and linked these articles to their IUPUI authors (either manually or using the Author Disambiguator tool).

In doing this work, we learned that some disciplines are more likely to benefit from bots that regularly contribute data from open databases such as PubMed. Thus, for the School of Public Health 2019 articles, more than 90 percent (101 of 111) were already in Wikidata. In contrast, of the forty-two works published in 2019 by the School of Philanthropy authors, none had been previously added to Wikidata. Similarly, for the School of Education’s 2019 publications, only one of the forty-five works had been previously added by a bot to Wikidata—an article that was indexed by PubMed. The potential undercoverage of social science and humanities scholarship in Wikidata is beyond the scope of this article, but the comparative gaps that we found may inform what data we prioritize for future Wikidata contributions.

Conclusion and Future Work

In this article, we have shared the different methods used to advance the Scholarly Profiles as Service (SPaS) at IUPUI. This service not only increases the visibility of the campus scholars in an open environment but may also contribute to raising the reputation of our institution. As profiles for our scholars and the works that they created are added to Wikidata, the collective work of the university’s scholars becomes easier to discover and cite.

While we see the value of engaging in this type of work, we also understand that integrating it into existing, often more traditional, library services can take time. However, following the successful contribution of works produced by three campus schools in 2019, we are enthusiastic about the potential for the SPaS work to include all campus-affiliated schools prospectively. The IUPUI open access policy workflow manages a dataset of more than three thousand articles and other campus publications every year. These data, derived from institutional reports and other sources, are often incomplete (e.g., lacking a DOI) or incorrect (e.g., typos in titles or sources). To date, UL has cleaned the metadata to the point that it can retrieve accurate DOIs, de-duplicate, identify open access status, and (if needed) notify an author for participation in the policy. We aim to find ways to build on this work (without overly increasing the labor involved) to systematically and annually add all articles from the IUPUI open access policy workflow to Wikidata. Effectively, this would create a complete collection of scholarly articles by IUPUI authors within a specific year.

Libraries have an opportunity to place themselves at the forefront of the open knowledge movement by contributing data to open infrastructures, but convincing library administrators of the importance of taking action now may prove challenging. This may be due, in part, to the nature of the Wikidata environment, where all users have access to contribute, edit, and curate the data. Many libraries may be more comfortable with the controlled environments that they have traditionally used for the curation of knowledge. Library-based Wikidata services, therefore, need consistent and ongoing internal and external outreach.

In addition to this ongoing outreach to stakeholders, one idea we hope to move forward as we continue the SPaS work is the creation of an online submission form to gather consent from scholars to contribute their images to Wikimedia Commons—the media repository for all Wikimedia projects—and for collecting their CVs. These images could either be the ones displayed on the scholar’s institutional web page or another image they have the rights to. For the CVs, we anticipate having to test and adopt PDF data extraction utilities to be able to more programmatically access specific data from the submitted files. Due to the current focus of the SPaS work on increasing visibility of IUPUI women scholars and their works, we are also in the planning stages for revisiting all entries created to represent authors to ensure that they all include the sex or gender (P21) statement with proper references (see Figure 5). To this end, we have created a model for how to reference this statement, which includes the reference URL (P854) for the scholar’s institutional web page, the retrieved date (P813), the archive URL (P1065), the archive date (P2960), and a quotation (P1683) to insert a quotation from the scholar’s institutional web page to capture the pronouns they used to describe themselves. Whenever a quotation is added, we include the property based on heuristic (P887) with value inferred from grammatical gender used in text (Q94997488) and a second value, as applicable, to indicate that it was also inferred from given name (Q69652498). By adding references to gender statements on Wikidata, we aim to avoid misgendering. At the same time, we believe that our work on Wikidata entries can be responsive to the expressed gender identities of authors at our university in a way that closed, proprietary systems cannot be.

Fig 5
Figure 5: Example of a reference added to support the sex or gender (P21) statement for an IUPUI-affiliated woman scholar.

We hope that the methods described in this article inspire other academic institutions to take on pilot projects that support ongoing efforts to bridge the gender divide in Wikidata while also contributing to strengthening the connections between academic institutions, their scholars, and their scholarly output.

References

Abellán, Jorge. 2019. “Wikidata le da alas al conocimiento abierto.” In The Internet Health Report 2019. https://internethealthreport.org/2019/wikidata-le-da-alas-al-conocimiento-abierto/?lang=es. Archived at https://archive.ph/CuAYF.

Albert, Paul J., Sarbajit Dutta, Jie Lin, Zimeng Zhu, Michael Bales, Stephen B. Johnson, Mohammad Mansour, Drew Wright, Terrie R. Wheeler, and Curtis L. Cole. 2021. “ReCiter: An Open Source, Identity-Driven, Authorship Prediction Algorithm Optimized for Academic Institutions.” PLoS ONE 16 (4): e0244641. https://doi.org/10.1371/journal.pone.0244641.

Allison-Cassin, Stacy, Alison Armstrong, Phoebe Ayers, Tom Cramer, Mark Custer, Mairelys Lemus-Rojas, Sally McCallum, Merrilee Proffitt, Mark A. Puente, Judy Ruttenberg, and Alex Stinson. 2019. “ARL White Paper on Wikidata: Opportunities and Recommendations.” ScholarWorks. https://scholarworks.iupui.edu/handle/1805/18902.

Balch, Oliver. 2019. “Making the Edit: Why We Need More Women in Wikipedia.” The Guardian, November 28, 2019. http://www.theguardian.com/careers/2019/nov/28/making-the-edit-why-we-need-more-women-in-wikipedia. Archived at: https://archive.vn/3L713.

Baltz, Samuel. 2021. “Wikipedia’s Political Science Coverage Is Biased. I Tried to Fix It.” The Washington Post, February 24, 2021. https://www.washingtonpost.com/politics/2021/02/24/wikipedias-political-science-coverage-is-biased-i-tried-fix-it/. Archived at: https://archive.ph/PAjhe.

Bear, Julia B., and Benjamin Collier. 2016. “Where Are the Women in Wikipedia? Understanding the Different Psychological Experiences of Men and Women in Wikipedia.” Sex Roles 74 (5): 254–65. https://doi.org/10.1007/s11199-015-0573-y.

Cobb, Simon. 2020. “Author Items in Wikidata.” WikiCite Virtual Conference 2020. https://commons.wikimedia.org/wiki/File:Author_items_in_Wikidata.pdf. Archived at: https://web.archive.org/web/20220601053916/https://upload.wikimedia.org/wikipedia/commons/7/79/Author_items_in_Wikidata.pdf

“Community Insights/2018 Report/Contributors.” 2019. Meta. https://meta.wikimedia.org/wiki/Community_Insights/2018_Report/Contributors. Archived at: https://archive.ph/vGKxz.

“Community Insights/Community Insights 2021 Report.” 2021. Meta. https://meta.wikimedia.org/wiki/Community_Insights/Community_Insights_2021_Report. Archived at: https://archive.ph/T8V8B.

“Comparison of Research Networking Tools and Research Profiling Systems.” 2021. Wikipedia. https://en.wikipedia.org/w/index.php?title=Comparison_of_research_networking_tools_and_research_profiling_systems&oldid=1023327450. Archived at: https://archive.vn/9Jm1R.

Ford, Heather, and Judy Wajcman. 2017. “‘Anyone Can Edit’, Not Everyone Does: Wikipedia’s Infrastructure and the Gender Gap.” Social Studies of Science 47 (4): 511–27. https://doi.org/10.1177/0306312717692172.

Graells-Garrido, Eduardo, Mounia Lalmas, and Filippo Menczer. 2015. “First Women, Second Sex: Gender Bias in Wikipedia.” In HT ’15: Proceedings of the 26th ACM Conference on Hypertext & Social Media, 165–74. New York: Association for Computing Machinery. https://doi.org/10.1145/2700171.2791036.

“Help:Namespaces.” 2022. Wikidata. https://www.wikidata.org/wiki/Help:Namespaces. Archived at: https://archive.vn/Dr24I.

Hill, Benjamin Mako, and Aaron Shaw. 2013. “The Wikipedia Gender Gap Revisited: Characterizing Survey Response Bias with Propensity Score Estimation.” PLoS ONE 8 (6): e65782. https://doi.org/10.1371/journal.pone.0065782.

Humaniki. n.d. “Gender Gap by Language Editions in Wikimedia Projects.” Accessed June 24, 2021. https://humaniki.wmcloud.org/gender-by-language.

Indiana University. n.d. “Scholars@IU.” Accessed June 3, 2021. https://scholars.proquest.com/gallery/indiana. Archived at: https://archive.vn/kgHtk.

IUPUI Institutional Research and Decision Support. n.d. “Employee Headcount Calculator.” Accessed May 5, 2021. https://tableau.bi.iu.edu/t/prd/views/FacultyStaffandStudentEmployees/EmployeeHeadcountCalculator?:iid=6&:isGuestRedirectFromVizportal=y&:embed=y. Archived at: https://archive.vn/62nqL.

IUPUI University Library. 2019. “Open Values.” https://cds.ulib.iupui.edu/openvalues. Archived at: https://archive.vn/hrrvB.

IUPUI University Library. n.d. “Open Access Policy, IUPUI Faculty Council (October 7, 2014).” Accessed June 8, 2021. https://openaccess.iupui.edu/policy. Archived at: https://archive.vn/dgNsP.

IUPUI University Library Center for Digital Scholarship. 2020. “IUPUI Open Access Policy: Annual Report for 2019.” https://scholarworks.iupui.edu/handle/1805/24383.

IUPUI University Library Center for Digital Scholarship. 2021. “IUPUI University Library Commitment to Open Knowledge.” https://cds.ulib.iupui.edu/digitalscholarship/openknowledge. Archived at: https://archive.vn/TVcAJ.

“IUPUI University Library WikiProject Programs.” n.d. Accessed May 30, 2021. https://outreachdashboard.wmflabs.org/campaigns/iupui_university_library_wikiproject/programs. Archived at: https://archive.vn/QjfhW.

Jacobs, Julia. 2019. “Wikipedia Isn’t Officially a Social Network. But the Harassment Can Get Ugly.” The New York Times, April 8, 2019. https://www.nytimes.com/2019/04/08/us/wikipedia-harassment-wikimedia-foundation.html. Archived at: https://archive.ph/Eye1A.

Johnson, Stephen B., Michael E. Bales, Daniel Dine, Suzanne Bakken, Paul J. Albert, and Chunhua Weng. 2014. “Automatic Generation of Investigator Bibliographies for Institutional Research Networking Systems.” Journal of Biomedical Informatics 51 (October): 8–14. https://doi.org/10.1016/j.jbi.2014.03.013.

Kinsella, Bret. 2019. “Voice Assistants Alexa, Bixby, Google Assistant and Siri Rely on Wikipedia and Yelp to Answer Many Common Questions about Brands.” Voicebot.Ai. July 11, 2019. http://voicebot.ai/2019/07/11/voice-assistants-alexa-bixby-google-assistant-and-siri-rely-on-wikipedia-and-yelp-to-answer-many-common-questions-about-brands/. Archived at: https://archive.ph/fGww2.

Klein, Maximilian, and Piotr Konieczny. 2015. “Gender Gap Through Time and Space: A Journey Through Wikipedia Biographies and the ‘WIGI’ Index.” ArXiv:1502.03086 [cs.CY]. https://doi.org/10.48550/arXiv.1502.03086.

Koren, Marina. 2018. “One Wikipedia Page Is a Metaphor for the Nobel Prize’s Record With Women.” The Atlantic, October 2, 2018. https://www.theatlantic.com/science/archive/2018/10/nobel-prize-physics-donna-strickland-gerard-mourou-arthur-ashkin/571909/. Archived at: https://archive.vn/Vxyg2.

Lemus-Rojas, Mairelys. 2019. “Open Knowledge Report (2017-2018).” ScholarWorks. https://scholarworks.iupui.edu/handle/1805/18586.

Lemus-Rojas, Mairelys, and Jere D. Odell. 2018. “Creating Structured Linked Data to Generate Scholarly Profiles: A Pilot Project Using Wikidata and Scholia.” Journal of Librarianship and Scholarly Communication 6 (1): p.eP2272. https://doi.org/10.7710/2162-3309.2272.

Lir, Shlomit Aharoni. 2019. “Strangers in a Seemingly Open-to-All Website: The Gender Bias in Wikipedia.” Equality, Diversity and Inclusion: An International Journal 40 (7): 801–18. https://doi.org/10.1108/EDI-10-2018-0198.

“List of Wikipedias.” 2021. Wikimedia. Accessed May 5, 2021. https://meta.wikimedia.org/wiki/List_of_Wikipedias. Archived at: https://archive.ph/7RWzu.

Mobley, Greg H. Email correspondence. May 5, 2020. Archives Specialist, IUPUI University Library, Ruth Lilly Special Collections and Archives.

Mornati, Susanna. 2019. “DSpace-CRIS: The Free Open-Source CRIS – What’s New in 2019.” euroCRIS. https://dspacecris.eurocris.org/handle/11366/979.

Nielsen, Finn Årup, Daniel Mietchen, and Egon Willighagen. 2017. “Scholia and Scientometrics with Wikidata.” ArXiv:1703.04222 [cs.DL]. https://doi.org/10.48550/arXiv.1703.04222.

Obeid, Jihad S., Layne M. Johnson, Sarah Stallings, and David Eichmann. 2014. “Research Networking Systems: The State of Adoption at Institutions Aiming to Augment Translational Research Infrastructure.” Journal of Translational Medicine & Epidemiology 2 (2): 1026. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4610407/.

ORCID. n.d. “IUPUI.” Accessed June 3, 2021. https://orcid.org/members/0010f00002Jfe4XAAR-iupui. Archived at: https://archive.vn/X35ap.

Pellissier Tanon, Thomas, and Fabian Suchanek. 2019. “Querying the Edit History of Wikidata.” The Semantic Web: ESWC 2019 Satellite Events, 161–66. https://doi.org/10.1007/978-3-030-32327-1_32.

Scholars@IU. n.d. Accessed June 3, 2021. https://scholars.proquest.com/gallery/indiana. Archived at: https://archive.vn/kgHtk.

Scholia. n.d. Accessed May 5, 2021. https://scholia.toolforge.org/. Archived at: https://archive.ph/vwG1O.

Simonite, Tom. 2019. “Inside the Alexa-Friendly World of Wikidata.” WIRED. February 18, 2019. https://www.wired.com/story/inside-the-alexa-friendly-world-of-wikidata/. Archived at: https://archive.ph/6JMiM.

“Statistics.” n.d. WikiCite. Accessed January 4, 2022. http://wikicite.org/statistics.html. Archived at: https://archive.vn/kdCaR.

Sun, Jiao, and Nanyun Peng. 2021. “Men Are Elected, Women Are Married: Events Gender Bias on Wikipedia.” arXiv:2106.01601 [cs.CL]. https://doi.org/10.48550/arXiv.2106.01601.

Wagner, Claudia, David Garcia, Mohsen Jadidi, and Markus Strohmaier. 2015. “It’s a Man’s Wikipedia? Assessing Gender Inequality in an Online Encyclopedia.” Proceedings of the International AAAI Conference on Web and Social Media 9 (1). https://ojs.aaai.org/index.php/ICWSM/article/view/14628.

Wagner, Claudia, Eduardo Graells-Garrido, David Garcia, and Filippo Menczer. 2016. “Women through the Glass Ceiling: Gender Asymmetries in Wikipedia.” EPJ Data Science 5: 5. https://doi.org/10.1140/epjds/s13688-016-0066-4.

“Wcmc-Its/ReCiter.” 2021. GitHub. https://github.com/wcmc-its/ReCiter.

“Wikidata Reconciliation for OpenRefine.” n.d. Wikidata. Accessed December 15, 2021. https://wikidata.reconci.link/.

“Wikidata:Relation between properties in RDF and in Wikidata.” 2017. Wikidata. https://www.wikidata.org/wiki/Wikidata:Relation_between_properties_in_RDF_and_in_Wikidata. Archived at: https://archive.ph/lZLub.

“Wikidata:Statistics.” 2021. Wikidata. https://www.wikidata.org/wiki/Wikidata:Statistics. Archived at: https://web.archive.org/web/20210401012913/https://www.wikidata.org/wiki/Wikidata:Statistics.

“Wikidata:WikiProject IUPUI University Library.” 2021. Wikidata. https://www.wikidata.org/wiki/Wikidata:WikiProject_IUPUI_University_Library. Archived at: https://archive.ph/WHCpQ.

“Wikipedia:GLAM/IUPUI University Library/Events.” 2021. Wikipedia. https://en.wikipedia.org/wiki/Wikipedia:GLAM/IUPUI_University_Library/Events. Archived at: https://archive.vn/xPNC1.