The South Asian Canadian Digital Archive Thesaurus: Thesaurus Construction for the South Asian Canadian Diaspora

Magnus Berg
South Asian Studies Institute, University of the Fraser Valley

Satwinder Kaur Bains
South Asian Studies Institute, University of the Fraser Valley

Sadhvi Suri
South Asian Studies Institute, University of the Fraser Valley

The South Asian Canadian Digital Archive (SACDA) is a soon-to-be-released digital repository developed by the South Asian Studies Institute at the University of the Fraser Valley, located in Abbotsford, British Columbia, Canada. SACDA partners with memory institutions, individuals, families, and organizations to digitize, describe, and provide online public access to heritage materials created by, or relevant to, the South Asian Canadian diaspora. This project report will detail how SACDA is building a customized thesaurus to classify its digitized archival holdings, augment existing subject headings and thesauri, and fill in taxonomical gaps. Building on prior work done by alternative thesauri like the Homosaurus, Association for Manitoba Archives Indigenous Subject Headings, Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Thesauri, and the International Thesaurus of Refugee Terminology, among others, the SACDA thesaurus intends to fill in a vital gap in South Asian Studies subject control, particularly from a Canadian perspective.

Keywords: South Asian studies; thesaurus construction; digital repositories


How to cite this article: Berg, Magnus, Satwinder Kaur Bains, and Sadhvi Suri. 2022. The South Asian Canadian Digital Archive Thesaurus: Thesaurus Construction for the South Asian Canadian Diaspora. KULA: Knowledge Creation, Dissemination, and Preservation Studies 6(3).

Submitted: 24 June 2021 Accepted: 24 January 2022 Published: 27 July 2022

Copyright: @ 2022 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.



The South Asian Canadian Digital Archive (SACDA) is a forthcoming digital repository developed by the South Asian Studies Institute (SASI) at the University of the Fraser Valley in Abbotsford, British Columbia, Canada. SACDA partners with memory institutions, individuals, families, and organizations to digitize, describe, and provide online public access to heritage materials created by, or relevant to, the South Asian Canadian diaspora. SACDA uses the Library of Congress (hereafter LoC) Subject Headings (LCSH) and Thesaurus for Graphic Materials (TGM) to assign subject headings to records; however, the terms available to describe South Asian Canadian diasporic archival material are often lacking or outright non-existent. In many instances, broad headings for describing materials in SACDA are not available in LCSH and TGM. For example, while LCSH does have headings for Muslim or Christian gay men, there is no heading available for Sikh or Hindu gay men. In other cases, we require headings specific to our records or local context, such as Jor Malla, a gathering festival specific to Paldi, a small logging town on Vancouver Island. While it is evident why this festival is not included in LCSH, its presence and significance in our records means that it warrants a subject heading in our repository.

To address these gaps, SACDA created a custom thesaurus relevant to the South Asian Canadian diaspora, to be used in conjunction with LCSH and TGM. The thesaurus construction is modeled after similar community-centered projects like the Homosaurus, which functions as a companion to existing thesauri rather than an outright replacement (Zwaaf 2020, 207–8). Alternative thesauri, such as the Homosaurus, provide cataloguers and information professionals with alternatives when dominant vocabularies like LCSH are insufficient to describe subjects that are culturally or locally specific. The SACDA thesaurus aims to fill in a similar gap. The thesaurus has been under construction by a small team of temporary project staff since October 2020. While it was initially conceived as a small controlled vocabulary limited to a few hundred concepts, it has since expanded in response to the limited concepts available in alternative vocabularies and to better classify the records contained within SACDA. This project report details why and how the SACDA thesaurus is being developed and where we hope to take the thesaurus in the coming years.

The Importance of the SACDA Thesaurus

Ever since the publication of Berman’s (1971) Prejudices and Antipathies, the presence of oppressive structures, misdescription, offensive and racist terminology, and erasure by dominant and colonial taxonomies and classification systems are subjects that have been widely covered in library and information studies (LIS) scholarship (Berman 1971; Knowlton 2005; Johnson 2010; Littletree and Metoyer 2015; Biswas 2018; Bone and Lougheed 2018; Howard and Knowlton 2018; Lo 2019; Watson 2020). The description of information resources for equity-seeking groups is a growing topic of research. Though the critical reappraisal of cataloguing practices is not a new phenomenon, critical librarianship and critical cataloguing have emerged in both LIS online communities and scholarship as a means of encouraging active application of critical theory in librarianship practice (Martin 2021; Schroeder and Hollister 2014). Despite this flurry of intellectual and professional activity, there are few alternatives to LCSH in North America, ultimately requiring institutions to choose between continuing to use subject headings that may do a disservice to their stakeholders or to design their own controlled vocabulary where one does not exist.

The lack of relevant subject headings for South Asian studies is attributable to its underrepresentation in LIS and in mainstream scholarship. As observed by Biswas (2017, 3), far fewer information professionals have a subject focus on South Asian studies than on queer studies, women’s studies, or Black studies, which results in a lower number of subject access points and standards for South Asian studies. Since the LoC makes additions to the LCSH based on literary warrant—the philosophy that headings are only warranted when sufficient material on the topic exists—the amount of available South Asian studies scholarship also has a direct correlation to the number of relevant subject headings produced by LoC (Lo 2019).

Moreover, the terminology that LCSH uses to classify people from South Asia/the South Asian diaspora is inconsistent. While most subject headings in LCSH use some variation of South Asian or South Asian American, there are still around fifty headings that use East Indian, including Foreign workers, East Indian, East Indian Americans, and East Indian diaspora. In some cases, there are even synonymous headings for the same concept, such as South Asian American teenagers and East Indian American teenagers. The subject headings are also in some cases colonial. Biswas argues that the use of “East Indian” in LCSH headings “represents a problematic vestige of colonialism because the term’s origins are inextricably tied to a European imagination that was shaped by the writings and experiences of maritime travelers, Christian missionaries, and colonial expeditions” (2017, 1). TGM suffers from a wholly separate issue, in that it is a far broader taxonomy and avoids explicit reference to social identity categories altogether. As contended by Adler, “any mention of race [in TGM] is veiled —it is only to be implied . . . [images] cannot be retrieved or found by way of subject headings that describe race or ethnicity” (2017, 123). Due to the inconsistency of South Asian/South Asian diasporic subject headings in LCSH and the relative dearth of headings available to describe a wide variety of subjects common in archival records documenting the South Asian Canadian diaspora, the SACDA team found it necessary to construct our own thesaurus to fill in these gaps, and in some cases replace existing LCSH or TGM headings.

First Phase Thesaurus Construction


SACDA is built by a small project team with limited funding, which limits the scope of work we could complete on the thesaurus during this first phase. The team is composed of a temporary archivist, who functions as project lead, in addition to student researchers and co-op students. Due to the nature of student labour, there was high turnover on the project team from term to term, and subject expertise was typically divided between computer science and information science. In an average term, half of the project team was composed of South Asian Canadians or recent immigrants or international students from the South Asian subcontinent, with the remaining half composed of European Canadian settlers. Team members relied heavily on the community knowledge of our colleagues on the project team proper, on the advisory committee set up for this project, and within the larger research staff at SASI. While we initially set out to create a very small list of controlled terms to augment LCSH and TGM, limited to a couple hundred terms, it quickly became apparent that the gaps in these thesauri were much larger than we had anticipated and that we would have to significantly expand the thesaurus to ensure that we had appropriate subject headings for the records we were describing.

To facilitate this expansion, the project team took a two-pronged approach to populate the thesaurus. The first was to create ad-hoc headings during the description process when a term was deemed to be missing, or insufficient, in LCSH or TGM. These headings were created based on archival warrant, meaning that we had at least one record in the repository with that subject. The second was to hold thesaurus-building workshops on a quarterly basis. These workshops were a dedicated, informal opportunity for the project team to get together for half a day and comb through LCSH and TGM for missing headings. While the workshops were structured—in that we identified themes needing additional terms in advance of each workshop—the workshops were otherwise non-prescriptive and allowed team members to explore the LCSH and TGM for gaps as they saw fit. Unlike the ad-hoc headings, during the workshops we created new headings regardless of archival warrant. If we estimated that there was a possibility that a particular heading could be pertinent in the future, we added it to the thesaurus. These workshops allowed us to expand the thesaurus relatively quickly, as we averaged several hundred new headings per workshop.


When weighing whether to create a new heading or an alternative to an existing LCSH or TGM one, the SACDA team took into consideration the following questions:

  1. What terms does LCSH and/or TGM use to describe people in other racialized groups, ethnic groups, or religious groups?
  2. What have you observed while cataloguing SACDA’s collections so far? What gaps did you see in LCSH and/or TGM?
  3. Are there any historic events that are significant to the South Asian Canadian diaspora that may need their own term?
  4. Does a pre-existing term in LCSH/TGM do the subject at hand justice? Or is it too broad/narrow/limiting?
  5. Does a pre-existing term sanitize/minimize the severity of a subject?1
  6. Does the local community use a synonymous term different than a pre-existing term?

These questions gave us the conceptual grounding we needed to identify when a new subject heading was necessary and when pre-existing subject headings needed to be replaced or revised to fit our local context. There were also instances where we found the need to create new headings that were not exclusively related to the South Asian Canadian diaspora but were still necessary to describe our records, particularly those documenting cross-cultural communities and anti-Asian racism. Examples include creating headings for anti-Asian racism and the Chinese head tax and Chinese Exclusion Act,2 as well as replacing Japanese--Canada--Evacuation and relocation, 1942-1945 with Japanese--Canada--Internment and forced relocation, 1942-1949.3

Prior to each workshop, the SACDA team identified themes that we foresaw needing particular attention in the thesaurus based on both our existing records and our prior knowledge of LCSH gaps. These themes included: prejudice, racism, religion, events, ceremonies, art, language, caste/casteism, regions, identity, activism, migration, business, and local issues. Themes were assigned to certain members of the SACDA team depending on their subject expertise, localized knowledge, and linguistic capabilities. While some headings were created solely based on a perceived gap within LCSH/TGM, others were created based on their common use within the diaspora, such as replacing the LCSH term Sikh temples with Gurdwaras, Hindu temples with Mandirs, and Mosques with Masjids. While the thesaurus, and the forthcoming SACDA portal more broadly, will primarily be in English, we sometimes created headings in Indic languages when it was deemed that there was not an appropriate equivalent in English, that using an English translation would dilute its meaning, or that our users would be more likely to search for a particular subject using an Indic language term rather than an English language term. Examples include Sarna sthals,4 which has no appropriate English equivalent, and wedding ceremonies or rituals like Kanya Daan (the giving of the girl to the groom) and Vatna (pre-wedding cleansing ceremony for the bride and groom), which would have been subsumed under overly broad headings like Marriage service or Rites and ceremonies.


The SACDA project team is made up of staff and student researchers in different disciplines and with different skill sets. As such, it was important for us to ensure that there was an approval process in place for the thesaurus to avoid issues like duplicate headings, synonymous headings, or inconsistent syntax. Further, since the thesaurus was being actively constructed in parallel to the rest of the repository’s development, there was a lot of potential for records to be inconsistently classified. To ensure that our headings were created and applied in a consistent way, we adhered to the workflow depicted in Figure 1. This workflow allowed us to be nimble enough to create new subject headings on the fly, while also ensuring that our headings were created and applied in a consistent manner. This workflow also allowed us to incorporate multiple viewpoints and skill levels without compromising the organizational integrity of the thesaurus. The archivist’s role as gatekeeper was primarily to avoid synonymous terms, duplicate terms, and inconsistent syntax rather than to question the validity or necessity of the proposed heading. Instead, the archivist assisted in revising or reshaping terms suggested by those without an LIS background so that they conformed to a common syntactical standard.

Fig 1
Figure 1: Flowchart illustrating the workflow used by the South Asian Canadian Digital Archive (SACDA) team to create new headings for the SACDA Thesaurus. New headings are flagged for review by the archivist, who either approves or rejects the heading. Rejected headings are replaced with more appropriate headings from the SACDA Thesaurus, LCSH, or TGM and justification is provided. Approved headings receive a persistent identifier (PID) and are subsequently ingested into the repository.

Classification of Identity

Classifying groups according to identity or geography has both positive and negative connotations. On the one hand, the classification of groups according to race, religion, sexual orientation, gender, class, caste, and ability levels has historically been used to perpetuate oppressive structures and systems against a particular identity group, up to and including genocide. For some, using any kind of identity category to classify records is a form of ghettoization. On the other hand, classification can also result in a sense of shared identity and belonging (Coleman-Fountain 2014; Porta et al. 2019; Rinderle and Montoya 2008; Kodama and Ebreo 2009). Not including certain identity categories in metadata records could be seen as a form of erasure and could ultimately make it difficult for users to retrieve records that deal explicitly with particular identity categories or that use disagreeable colonial nomenclature.5 Both of these viewpoints were taken into account when developing the SACDA thesaurus.

Ultimately, the SACDA team chose to create granular, identity-based thesaurus terms for two reasons: one, to facilitate information retrieval of specific, identity-based research topics that may be of interest to our users; and two, to create subject headings that could be used by other organizations whose collections are less focused on the South Asian Canadian diaspora. To avoid issues that can sometimes arise with identity-based subject headings, we attempted to avoid “neutral” identity headings, which often reinforce stereotypes about certain subjects and/or ghettoize marginalized groups. For example, when creating a new heading for women (e.g., South Asian Canadian women journalists), we generally made a parallel term for men (e.g., South Asian Canadian men journalists). This allowed our cataloguers to use a term specifically for male South Asian Canadian journalists rather than using a broader “South Asian Canadian journalists” heading as a “neutral” heading when describing records with male subjects. We also created terms for racialized groups outside of South Asian Canadians, such as East Asian Canadians and White Canadians. This avoided similar issues around kyriarchical ideas of “default,” “neutral,” or “dominant” headings, and will allow our users to create more targeted queries around intercultural relations.

Future Work

At present, the SACDA thesaurus is composed of over sixteen hundred subject headings and continues to be actively developed by members of the SACDA project team. The thesaurus is currently stored using our digital repository system, Collective Access. Because of the limited time and resources the SACDA team can devote to the development of the thesaurus, it has a simple, flat structure. Aside from the name of the heading itself, the only other information recorded with each heading is a persistent identifier. Although this simple structure works while the thesaurus and project team are relatively small, additional resources and structure would be necessary to scale the thesaurus. We hope to eventually provide more structure for the thesaurus and create linkages between subject headings, such as related headings and parent headings. We also hope to eventually convert the thesaurus to a format like RDF and make it available for use by other GLAM organizations.


The following people contributed to the development of the SACDA thesaurus during its first phase: Sadhvi Suri, Alisa Sohi, Dr. Satwinder Kaur Bains, Tim Ubels, Benjamin Arends, Tania Teixeira, Magnus Berg, and Thamilini Jothilingam.

1 The intention with this question was to identify headings that utilize language that is overly neutral when describing historical events. For example, an existing heading in LCSH, Japanese--Canada--Evacuation and relocation, 1942-1945, was considered to be sanitizing as it frames the forced relocation and incarceration of Japanese Canadians as a neutral or necessary act rather than as an act of violence, prejudice, and theft on behalf of the state.

2 Under the Chinese Immigration Act of 1885, Chinese immigrants to Canada were forced to pay a tax of between $50 and $500 to enter the country. It is the only legislation of its kind in Canadian history. The head tax was abolished in 1923 in favour of the new Chinese Immigration Act, also known as the Chinese Exclusion Act, which was a piece of legislation passed by the Canadian federal government that significantly restricted Chinese immigration to Canada until it was repealed in 1946. It is estimated that fewer than twenty Chinese immigrants were able to enter the country while the Act was in effect (Canadian Museum of Immigration at Pier 21 n.d.; McRae n.d.; Chan 2016).

3 We chose to extend the time period in this subject heading due to the fact that Japanese Canadians were not allowed free movement until 1949. Our decision to use internment may still not be an appropriate one, as the difference between an internment camp and a concentration camp is contested. The term was intended to be broadly applicable to records whose subject was the forced relocation, forced labour, forced evacuation, and/or forced deportation of Japanese Canadians during, and after, the Second World War. For records where it was deemed warranted, it could be used in combination with the pre-existing LCSH heading Concentration camps—Canada.

4 Sarna sthals are sacred groves to practitioners of the Sarna religion.

5 While the SACDA Thesaurus makes a point to avoid colonial and outdated terminology, there are instances in our descriptions where we retain outdated/colonial/offensive/othering terminology when it is contextually relevant. This is done to facilitate the study of such language by scholars and to avoid erasing the history of white supremacy and colonization evident in archival records.