The Marmaduke Problem: A Case Study of Comics as Linked Open (Meta)data

Kate Topham
Michigan State University

Julian Chambliss
Michigan State University

Justin Wigard
Michigan State University

Nicole Huff
Michigan State University

Michigan State University (MSU) is home to one of the largest library comics collections in North America, holding over three hundred thousand print comic book titles and artifacts. Inspired by the interdisciplinary opportunity offered by digital humanities practice, a research collaborative linked to the MSU Library Digital Scholarship Lab (DSL) developed a Collections as Data project focused on the Comic Art Collection. This team extracted and cleaned over forty-five thousand MARC records describing comics published in Canada, Mexico, and the United States. The dataset is openly available through a GitLab repository, where the team has shared data visualizations so that scholars and members of the public can explore and interrogate this unique collection. In order to bridge digital humanities with the popular culture legacy of the institution, the MSU comics community turned to bibliographic metadata as a new way to leverage the collection for scholarly analysis. In October 2020, the Department of English Graphic Possibilities Research Workshop gathered a group of scholars, librarians, Wikidatians, and enthusiasts for a virtual Wikidata edit-a-thon. This project report will present this event as a case study to discuss how linked open metadata may be used to create knowledge and how community knowledge can, in turn, enrich metadata. We explore not only how our participants utilized the open-access tool Mix’n’match to connect the Comic Art Collection dataset to Wikidata and increase awareness of lesser-known authors and regional publishers missing from OCLC and Library of Congress databases, but how the knowledge of this community in turn revealed issues of authority control.

Keywords: linked open data; comics; Wikidata; authority control; special collections


How to cite this article: Topham, Kate, Julian Chambliss, Justin Wigard, and Nicole Huff. 2022. The Marmaduke Problem: A Case Study of Comics as Linked Open (Meta)data. KULA: Knowledge Creation, Dissemination, and Preservation Studies 6(3).

Submitted: 25 June 2021 Accepted: 22 October 2021 Published: 27 July 2022

Competing interests and funding: The authors declare that they have no competing interests.

Copyright: @ 2022 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See



With over three hundred thousand print comic books and artifacts, Michigan State University (MSU) houses one of the largest library comics collections in North America. Despite decades of careful cataloging enhanced through extensive indexing by comic bibliographer Randy Scott (1991), the Comic Art Collection’s metadata remains a resource not fully explored. Therefore, a research collaborative linked to the MSU Library Digital Scholarship Lab developed the Comics as Data North America (CaDNA) project to extract and clean over forty-five thousand MARC records describing comics published in North America. However, while this dataset provided insight into the extensive collection, there was still much work needed to make this data accessible and to flesh out the information provided. This is especially important given MSU’s legacy in popular culture studies.

Building on this project, the MSU Department of English Graphic Possibilities Research Workshop (GPRW) gathered a group of scholars, librarians, Wikidatians, and enthusiasts for a virtual Wikidata edit-a-thon in October 2020. Participants in this event added missing comics publishers and authors from 1929 to 1956 to Wikidata with information from the Comic Art Collection metadata. This event was the first in a series of ongoing Wikidata-driven events.

This project report analyzes the Wikidata events hosted by the MSU GPRW as a case study to discuss how linked open metadata may be used to create knowledge and how community knowledge can, in turn, enrich metadata. In our workshops, participants connected the Comic Art Collection dataset to Wikidata, and in doing so discovered and remedied authority control errors in the MSU Libraries (MSUL) records. Our use of Wikidata enabled us to correct and enrich our metadata by lowering the barrier to entry, allowing for discussion, and accepting community input.

Origin Story

CaDNA originated as part of a grant proposal developed by faculty, librarians, and digital humanists at MSU1 in response to the call for proposals for Collections as Data: Part to Whole (2018), an Andrew W. Mellon-funded grant competition to foster development of models that support collections as data implementation and use. The initial grant project proposal was not successful, yet it brought together a group that could build on the established dataset research2 at MSUL. Inspired by these past projects, the group sought to explore how the Comic Art Collection metadata could address questions about the relationships between texts and their communities. Our proximity to the Comic Art Collection made us uniquely situated to explore the ways that digital scholarship might be leveraged to create data-driven explorations of comic culture not previously addressed.

While much of the literature on digital humanities (DH) and comics emphasizes ways to think about comics as digital objects, our collection places us in the unique position of defining how institutional collections form our vision of the medium (see Whitson and Salter 2015). A collections-as-data approach can identify patterns of cultural production that might highlight communities of practice, underrepresented groups, and patterns of literary output not easily accessible through qualitative means. Our effort does not speak to broader questions about using DH tools to study the comics themselves; rather, we focus on relationships represented by the data and thus introduce a new perspective drawn from our collective knowledge of relationships within the comics community.

Creating the Dataset

In 2017, the first Collections as Data National Forum was held at University of California – Santa Barbara in order to encourage computational use of and provide guidance for ethical engagement in library collections. One outcome of this forum was the generation of the “Santa Barbara Statement on Collections as Data” (SBSCD), which poses several questions interrogating the use of collections as data and offers guiding principles for treating collections as data (Always Already Computational – Collections as Data 2018). Drawing upon both these questions and principles (especially principles three, six, and ten), CaDNA began by extracting records from the MSU catalog and converting them into spreadsheet format. In pursuit of questions of locality and evolution over time, we focused on cleaning dates and locations. We used Flourish to plot the number of titles published from 1888 to 2019 according to place of publication (Figure 1) (Topham 2020a).

Figure 1: Comics as Data North America’s visualization of the number of comic titles published in North American locations in 1980.

New York and California emerged as major hotspots due to the locations of Marvel and DC comics respectively, but the visualization also highlights smaller publishers across North America. This raised questions in regards to both DH and comics: who are these creators? Who is publishing with whom? Can we trace the growth of comics communities through this bibliographic data?

The Turn Toward Wikidata

To answer these questions, the team turned toward the Library of Congress Classification publisher and author fields (LOC 1999). The Comic Art Collection contains thousands of items from mainstream publishers and creators as well as a rich collection of comics from small underground communities. Publisher data in particular presented many new challenges in data cleaning, such as subsidiaries, imprints, and name changes. Furthermore, how the catalog represents publisher names varies from item to item. A comic published by Marvel may be represented as “Marvel Comics,” “Marvel Comics Group,” “Marvel Publishing,” “Marvel Publishing, Inc.,” and so on. This variation stems from a lack of authority control over publishers in the catalog.3

In order for this dataset to be useful, all versions of a publisher’s or author’s name must be connected under the same value. We found a solution in OpenRefine’s Wikidata Reconciliation Service, which attempts to match each value in a given column to a Wikidata item. It would connect the records for “Marvel Comics,” “Marvel Publishing,” “Marvel Publishing, Inc.” to one item: Marvel Comics (Q173496). This cleaning process revealed that many of our entities were missing in Wikidata. About 50 percent of publisher names were matched automatically, 30 percent could be matched manually to a Wikidata item, and about 20 percent did not have a match on Wikidata. This translates to about ten thousand missing publishers.

When we applied the same process to the author field, it initially seemed that this reconciliation was more successful than that for the publishers. OpenRefine automatically matched a higher percentage of entries to Wikidata items, in part because the MSU catalog had standardized this field through the use of Library of Congress Name Authority Files.4 However, in evaluating the automatic matches, we found that a significant number of entries were incorrect: Wikidata had matched the entry to an incorrect item with a similar name. Our team investigated this problem, but found it was too burdensome to solve by ourselves. At the end of the reconciliation process, we were left with ten thousand unmatched publishers, five thousand unmatched authors, and an undetermined number of incorrect matches. If we had simply put our data onto Wikidata by itself, many of the relationships we hoped to study might be missing or, worse, inaccurate. In order to turn this metadata into linked data, we were going to need some help.

Assembling the Edit-Avengers

There was an idea . . . called the Edit-Avengers Initiative. The idea was to bring together a group of remarkable people, see if they could make this data something more. See if they could work together when we needed them to, to create the information we never could.

To this end, the MSU Department of English GPRW hosted two related virtual Wikidata events connected to the CaDNA collection: one in fall 2020 and one in spring 2021. The fall event used bibliographic metadata to add missing comics publishers and authors from 1929 to 1956 to Wikidata. We partnered with several other comics institutions, and in doing so, we brought together fifty-five participants from MSU and other institutions across the United States, Canada, and the United Kingdom. We used Discord as our primary social platform due to its multiple avenues of communication. Using a tutorial we created for Mix’n’match (Topham 2020), a Wikidata tool that loads sections of large data and assigns them randomly to users to upload into Wikidata, our participants matched 2,227 of our authors and publishers from the MSU Comic Art Collection to items on Wikidata.

The spring 2021 event focused on creating visualizations utilizing the uploaded Wikidata to address one central question: how can Wikidata liberate our thinking about comics? In consultation with Wiki Education Program Manager Will Kent, we deployed two Wikidata community tools for creating these visualizations: Wikidata Graph Builder (WGB) and the Wikidata Query Service (WDQS). We first recorded a podcast episode with Kent (2021) about the significance of Wikidata for participants, then created the “Graphic Possibilities Comics Wikidata Tutorial Spring ‘21” video (Wigard 2021) demonstrating how WGB would initially allow our participants to search for related Wikidata items and then load them into WQS for robust visualizations. After an initial lecture from Kent, we split our sixteen participants into four separate groups to create visualizations based on their assigned Golden Age comics publishers (Fawcett, Dell, Quality Comics, Charlton), and each group’s own emergent critical interests. Our participants ultimately created four digital visualizations: three related to comics publishers and one for a broad survey on comics publisher history. This event not only fostered an interrogation of the nature of knowledge creation and the authority around it, but also stressed the value of diverse communities in generating that knowledge. It was only through the combined knowledge of specialists, independent researchers, scholars, and students of all levels that these events were a success.

The Marmaduke Problem

Our Wiki-comics community also alerted us that, in many cases, the Library of Congress Name Authority files were applied incorrectly, an issue we have dubbed “the Marmaduke Problem.” This problem takes its name from the comic strip created by Brad Anderson, who is one of the more prominent victims.

There are many people who go by the name Brad Anderson. At the time of this writing, there are seven Library of Congress Name Authority records with the name Brad Anderson. Our collection attributes twenty-four Marmaduke titles to the name Brad Anderson. Logically, these should be attributed to Brad Anderson (LOC 1983), creator of Marmaduke but, with two exceptions, all are attributed to Brad Anderson (LOC 2016), a colorist for DC comics. The source for this Brad Anderson’s Name Authority File is a Justice League title, and the authority record contains the following editorial note: “Do not confuse with Brad Anderson, 1924-2015, creator of Marmaduke.” Despite this note, there are eighteen Marmaduke titles in this Brad Anderson’s “Contributor To Works” list.

Upon investigation, we traced this mistake beyond the MSUL catalog to OCLC, which provides most of the MSU MARC records. OCLC works in tandem with the Library of Congress to provide libraries with MARC records through the Program for Cooperative Cataloging (LOC n.d.b). These records make use of Library of Congress Name Authority Files to ensure authority control is consistent across institutions. The OCLC WorldCat Identity file for Brad Anderson conflates the two, listing DC titles alongside Marmaduke titles (OCLC WorldCat Identities n.d.a). Notably, among those titles is Swamp Thing: Roots of Terror, which credits Anderson among its colorists, and which was published in 2019 (OCLC WorldCat Identities n.d.b), four years after the death of the Marmaduke creator (Slotnik 2015). Under “Genres,” which is used to identify the genres an author is associated with, it lists “Cartoons (Humor)” and “Caricatures and cartoons,” terms we would associate with the creator of Marmaduke, alongside “Superhero comics” and “Science fiction comics,” which we would associate with the DC colorist. While this page does not serve as the authority file itself, it links the titles listed to the Library of Congress and Virtual International Authority Files as well as to the Wikipedia and Wikidata pages for Brad Anderson, the creator of Marmaduke. This authority control problem exists beyond our catalog, and it is out of scope for our project team to fix.5

However, our Wikidata edit-a-thon also provided the solution to the Marmaduke Problem. Participants who identified attribution errors in the application of Library of Congress Name Authority Files or within the original OCLC records corrected these errors in Wikidata.

The openness and relatively low barrier to entry of Wikidata allowed our participants to correct errors in real time. When ambiguities arose, the Discord server provided a forum for participants to pool their knowledge and come to a consensus. Our collective approach identified and repaired flaws overlooked by traditional authority control processes. The “wisdom of the crowd,” facilitated by Wikidata, can supplement professional standards and further enable collections as data research.

Wikidata and the Case for Community-Led Authority Control

At the 2017 WikidataCon, Theo van Veen suggested that Wikidata could become a universal thesaurus for library authority records. Further, Wikidata could aggregate all identifiers for a given entity, and its Wikidata QID could then be used in a MARC record in place of a local identifier (van Veen 2017). This would simplify authority control by linking many lists into one system. Four years later, Wikidata has combined many authority lists, becoming a platform for both community-based authority control and collections as data.

Wikidata has a great capacity for modeling complex relationships that can be leveraged for scholarly research. As Dunst, Laubrock, and Wildfeuer (2018) note, the growth of digital technologies has opened new doors for humanistic study of comics. In describing our publishers, we found traditional authority records lacking because they do not connect subsidiaries and may not include all variants and aliases that users are aware of. Wikidata lists previous or alternate names and also provides information about when those names were used. For example, when comparing the Library of Congress Authority File for Marvel Comics Group (LOC 2019) with its Wikidata counterpart, Marvel Comics (Q173496), the Wikidata item includes more variant terms and aliases than the LOC file. The Wikidata item includes three official names, along with the time period in which the name was used. Including more variants in the Marvel Comics record connects more comics together despite the variation in the publisher name listed in the MARC records.

Wikidata also connects publishers to their founders, parent companies, and imprints, opening up a trove of new information about how comics publishing communities evolved over time. This is exemplified in a visualization made by the Fawcett group in our April 2021 Wikidata event (Figure 2). Using the Wikidata Query Service, the group created a network diagram centered on Fawcett Comics, which shows not only the corporate hierarchies in play around Fawcett, but also the connections between publishers, creators, and characters. In this example, we can see that Captain Marvel was published by both Fawcett and DC Comics. However, we can expand beyond this query using other information in Wikidata.

Fig 2
Figure 2: Screenshot of the Wikidata Query Visualization created by Karina Ocanas, Christine Eslao, and Allison Bailund, the Fawcett Publications group at the April 2021 Graphic Possibilities Wikidata Visualization event. See to view the visualization.

When we expand “DC Comics” (Figure 3), we can observe the larger community around it and its subsidiaries. Here we can start to visualize the collaboration between WildStorm’s founder Jim Lee and writer Brandon Choi before WildStorm was sold to DC Comics in 1999 (Jimenez 2004; Phegley 2010). These connections are not available in other linked data sources. By connecting the MSUL collection to Wikidata, users can unearth new connections and discover more communities.

Fig 3
Figure 3: Screenshot of an expansion of the Fawcett Publications Visualization.

Moreover, pairing this community with Wikidata allowed us to create information that will support future cataloging efforts as well as future scholarship. Through the use of Wikidata, a freely available and accessible source of information, our work lowers the barrier of use for our library data, which fulfills SBSCD (2018) principle three. Unlike other linked data sources, Wikidata makes their data freely available through its API and query service and simplifies the editing process, therefore incorporating more information than other linked data services. Our Wikidata edit-a-thon showcased how much knowledge is spread throughout the comics community beyond MSU.6 By welcoming scholars, librarians, enthusiasts with various specializations and without institutional boundaries, we created a large knowledge base. Attendees raised issues and shared knowledge, allowing us to add more information and disambiguate more entities that any of us could on our own.

As with any crowdsourced project, any user can add incorrect or disputed information to Wikidata. However, the emphasis on references, the use of property constraints, and the ability to deprecate/flag problematic statements mitigates this factor. Another user can mark an incorrect value as “deprecated,” but is required to explain why the value is incorrect and encouraged to add a reference. Each item includes a “discussion” page, which collects documentation and allows users to discuss ambiguities or inaccuracies in the item record. Through these functions, Wikidata encourages discussion, accountability, and transparency among its users, and with a low barrier to entry, users can more quickly detect and correct errors.

Conclusion and Next Steps

These Wikidata community events have led to three major developments. We created a Comic Art Collection Property Number (2021), which allows easier use of Mix’n’match to create Wikidata items for our collection’s objects. Our event tutorials were shared publicly,7 supporting similar engagements with Wikidata outside of MSU and thus facilitating the use of the Comic Art Collection in scholarship. These developments support our ongoing MSU comics Wikidata events. While the initial stage focused on comics from 1940 to 1960, we will host at least two more events in the 2021–22 academic year. We will use Mix’n’match to add every item in our collection to Wikidata and populate those items with statements, creating new connections that can be leveraged to study comics communities, reanimating our metadata as knowledge.


Allison-Cassin, Stacy, and Dan Scott. 2018. “Wikidata: A Platform for Your Library’s Linked Open Data.” Code{4}Lib Journal 40.

Always Already Computational – Collections as Data. 2018. “The Santa Barbara Statement on Collections as Data, Version 1.” Always Already Computational – Collections as Data.

Clack, Doris Hargrett. 1990. Authority Control: Principles, Applications, and Instructions. Chicago: American Library Association.

Dunst, Alexander, Jochen Laubrock, and Janina Wildfeuer. 2018. “Comics and Empirical Research: An Introduction.” In Empirical Comics Research: Digital, Multimodal, and Cognitive Methods, edited by Alexander Dunst, Jochen Laubrock, and Janina Wildfeuer, 1–25. New York: Routledge.

Graphic Possibilities Department of English Graduate Workshop. 2021. Michigan State University. Last revised October 2021.

Kent, Will. 2021. “Will Kent.” The Graphic Possibilities Podcast, episode 7, January 29, 2021. Interview by Julian Chambliss, Nicole Huff, and Justin Wigard. Audio podcast, 29:19.

Library of Congress (LOC). n.d.a. “About NACO.”

Library of Congress (LOC). n.d.b. “Program for Cooperative Cataloging.”

Library of Congress (LOC). n.d.c. “The NACO FTP Process.” Archived at:

Library of Congress (LOC). 2016. “Anderson, Brad.” LC Name Authority File (LCNAF). Last revised March 9, 2016. Archived at:

Library of Congress (LOC). 1983. “Anderson, Brad, 1924-2015.” LC Name Authority File (LCNAF). Last revised March 15, 2016. Archived at:

Library of Congress (LOC). 2000. “Anderson, Brad, 1964-.” LC Name Authority File (LCNAF). Last revised June 13, 2017. Archived at:

Library of Congress (LOC). 1980. “Marvel Comics Group.” LC Name Authority File (LCNAF). Last revised September 26, 2019. Archived at:

Library of Congress (LOC). 1999. MARC 21 Format for Bibliographic Data. Library of Congress Network Development and MARC Standards Office. Revised November 24, 2021.

Michigan State University Libraries. n.d. “Datasets for Digital Research.” Michigan State University.

OCLC WorldCat Identities. n.d.a “Anderson, Brad.” OCLC. Archived at:

OCLC WorldCat Identities. n.d.b. “Swamp Thing: Roots of Terror.” OCLC. Archived at:

Ocanas, Karina, Christine Eslao, and Allison Bailund. 2021. “Fawcett Publications Visualization.”

Padilla, Thomas, Hannah Scates Kettler, Stewart Varner, and Yasmeen Shorish. 2019. “Call for Proposals.” Collections as Data – Part to Whole. Archived at:

Scott, Randall W. 1991. Comics Librarianship: A Handbook. Folkestone: McFarland.

Slotnik, Daniel E. 2015. “Brad Anderson, Creator of ‘Marmaduke,’ Dies at 91.” New York Times, September 9, 2015.

Topham, Kate. 2020a. “Comics as data North America - location over time.” Flourish. Last revised June 2021. Archived at:

Topham, Kate. 2020b. “Mix’n’Match Tutorial – 2020 Graphic Possibilities Wikidata Edit-a-Thon.” YouTube video, 19:23. September 25, 2020.

van Veen, Theo. 2017. “Wikidata as Universal Library Thesaurus.” Presentation at WikidataCon 2017. Wikimedia Commons.

van Veen, Theo. 2019. “Wikidata: From ‘an’ Identifier to ‘the’ Identifier.” Information Technology and Libraries 38 (2): 72–81.

Whitson, Roger Todd, and Anastasia Salter. 2015. “Introduction: Comics and the Digital Humanities.” DHQ: Digital Humanities Quarterly 9 (4).

Wigard, Justin. 2021. “Graphic Possibilities: Comics Wikidata Tutorial (Spring ’21).” YouTube video, 12:19. April 19, 2021.

Wikidata. 2021a. “Brad Anderson (Q106989804).” Last revised February 4, 2022.

Wikidata. 2021b. “Marvel Comics (Q173496).” Last revised February 11, 2022.

Wikidata. 2021c. “Michigan State University Library Comic Art Collection Record Number (P9555).” Last revised December 23, 2021.

Wikidata. 2021d. “Wikidata Status Updates: 2021 04 26.” Last revised April 29, 2021.,_articles,_blog_posts,_videos.


1 The initial project team at Michigan State University consisted of Julian Chambliss (Department of English), Kate Topham (Digital Humanities Archivist), Devin Higgins (Digital Library Programmer), Ranti Junus (Systems Librarian), Kristin Mapes (Assistant Director of Digital Humanities), and Scout Calvert (Data Librarian).

2 Michigan State University Libraries has a history of creating datasets from library collections and making them publicly available on the Datasets for Digital Research page.

3 Authority control is a process in library cataloging that organizes information under headings, or “authorized” terms, such that Mark Twain will always be referred to as “Twain, Mark, 1835-1910” rather than copied from the item directly (Clack 1990). Using headings simplifies discoverability in library systems.

4 The Library of Congress Name Authority files are searchable through their Linked Data Service:

5 In order for our team to remedy this problem, we would need to go through the Name Authority Cooperative Program within the Program for Cooperative Cataloging. This process requires institutional buy-in, a week-long training workshop, and an ongoing partnership between MSU and the Library of Congress (LOC n.d.b, n.d.c).

6 Stacy Allison-Cassin and Dan Scott (2018) provide a case study of community adding information to Wikidata in “Wikidata: A Platform for Your Library’s Linked Open Data.”

7 Wikidata. “Wikidata Status Updates: 2021 04 26.” Last revised April 2021.,_articles,_blog_posts,_videos.