The Vaccine Safety Project on Wikipedia

There are numerous sources on the internet, particularly on social media, spreading misinformation related to vaccine safety. It is difficult to control the extent to which such messages circulate on the internet, because most of these posts circulate in closed groups in social media and other online echo chambers. The vaccine debate is stronger now more than ever due to the concerns regarding the safety and efficacy of the prospective COVID-19 vaccines which are under development. In addition, the COVID-19 pandemic has caused disruptions in vaccination services across the globe. All these factors make it important that reliable and updated information regarding vaccine safety is communicated to the public. 

Wikipedia is one of the most popular knowledge platforms in the world. The health information on English Wikipedia receives huge traffic, which makes it one of the most consulted health care resources in the world. The Wikipedia article about the COVID-19 vaccine has gathered over two million views. Therefore, it is important that Wikipedia’s vaccine safety information is updated and reliable. 

The Vaccine Safety Project launched this summer to find and bridge the knowledge gaps related to vaccine safety on English Wikipedia. The pages of the Vaccine Safety Project were designed like a WikiProject, a portal for Wikipedians with similar interests to collaborate with each other. The project created a portal with spaces for general discussion (talk), sharing vaccine-related news (news), listing articles related to vaccine safety (navbox), sharing tips for new editors (tips), listing sources and missing topics related to vaccine safety (sources, sources list, missing topics) and for article suggestions from Wikidata (Wikidata lists).

The Vaccine Safety Project also documented the existing knowledge related to vaccine safety on Wikipedia, which includes over 100 articles. The Sources list contains search strategies for finding relevant resources from medical repositories containing vaccination information. The project also contains links to reference sources that contain relevant images and data which could be used for strengthening the vaccine safety information on Wikipedia. One of the features of the vaccine safety project is the Missing Topics page. Topic areas related to vaccine safety which do not feature on Wikipedia are mapped here. In addition to general topics, organizations related to vaccine safety and country-based vaccination status are listed on this page. The resources listed in this page could be used in future to create the articles related to missing topics from a scratch. 

The Vaccine Safety Project uses data from Wikidata, the sister project of Wikipedia, which is a free structured data repository. The project uses Listeria, an automated script, to create a list of topics surrounding vaccines, journals on vaccines and vaccine related journal articles. This list is updated every 24 hours, ensuring that all changes made on Wikidata are included. The entries present in this list could be used to create new articles related to vaccine safety on Wikipedia. 

As a part of this project, bibliographic information related to vaccination from the National Academy of Sciences was uploaded to Wikidata. This was accomplished with a collaboration from Houcemeddine Turki, a Wikipedian working on bibliographic information on Wikidata and project lead of WikiCred project RefB: Adding Reference Support to Biomedical WikiData Statements.

Information from the Vaccine Safety Project was used to conduct the first Vaccine Safety edit-a-thon, a community event where experts and newcomers came together to edit Wikipedia articles. The edit-a-thon was organized by NewsQ and Wikimedia DC, in partnership with the World Health Organization’s Vaccine Safety Network and the Stanford History Education Group. Approximately twenty-five people participated in this edit-a-thon, including medical doctors, researchers and experienced Wikimedians. This event led to eight article creations and the expansion of 461 articles. Similar events are being planned for bridging the knowledge gaps related to vaccine safety on Wikipedia next year, also in different languages. 

If you are interested in leaving feedback about the Vaccine Safety Project, please do so on the talk page of the project here.

Writing about COVID-19 on Wikipedia

Last month was eventful not only in terms of my personal and professional life, but also in terms of my volunteering work. In March-April, I have been regularly writing articles on English Wikipedia about COVID-19, mostly about the medical aspects, issues surrounding the impact of the pandemic and people in leadership in responding to COVID-19.

I am used to doing everything in a structured way on Wikipedia, but COVID-19 changed everything. I usually take days and weeks to think about a new project on Wikipedia, then create a time line and a work plan, and then work systematically on each aspect of the work. But in a crisis situation like a pandemic, this level of structuring is not possible, so I am helping out wherever help is needed. Nowadays, I log in to Wikipedia in the morning, read the updates about the pandemic from there and then go searching for topics that are missing. Given the recentness of the pandemic, there is usually a lot to write about, especially about its socio-economic impact. In addition, the tables about the disease epidemiology need to be updated, new regulations and lockdowns passed in various countries need to be added and the biographies of notable individuals working on COVID-19 need to be created. I work on all these aspects.

I get my references from all kinds of sources, thanks to most journals making their COVID-19 research papers open access. Many magazines and newsletters like The Economist have made their articles related to COVID-19 subscription-free. The WHO, UNPFA, UNICEF, Human Rights Watch, Amnesty International and many other organisations have also created several documents related to COVID-19 and the impact of the pandemic on various spheres of life. I have generously drawn content from all these sources for creating and expanding articles on Wikipedia.

I have mostly been following the World Health Organisation (WHO) for knowing the latest disease updates, so I mostly bring information from the WHO to Wikipedia. As of 9 April 2020, I have written around 25 articles related to COVID-19 on Wikipedia. The most popular one so far is 2020 coronavirus pandemic in Kerala. The article I am most proud of is Gendered Impact of the 2019-20 coronavirus pandemic. The article which I think would be the most useful is List of unproven methods against COVID-19, given the misinformation circulating about the disease. Nearly 700 edits I made on English Wikipedia thus far are on articles related to COVID-19. The articles started by me have been viewed around 35,000 times every day during the last one month.

What am I going to do next? We are still in this pandemic and the situation is rapidly evolving (for better or for worse, we don’t know yet). So, I am going to take everything one day at a time, doing what is important for today, not making any long term plans. I will continue to do what I am doing right now on Wikipedia, until help is no longer needed. As a Wikipedian, doctor and researcher, this is the least I can do to empower people around the world to get open and reliable information about COVID-19.

Stay safe, y’all.

 

How to identify misinformation related to coronavirus?

We live in the era of information overload and misinformation. Ever since coronavirus started being a cause for panic among the public, a lot of misinformation regarding it started circulating in the internet. How to identify if a given information is true or not?

  1. Check the source of the information. If the information you found comes from a website, check the URL to find out if it is a reliable organization. Some of the sources that you can rely on are the governments of your countries, World Health Organization and established newspapers. Even Wikipedia has reliable information related to coronavirus pandemic. This is made possible by thousands of volunteers, including experts, monitoring  pages related to coronavirus and updating the pages for accurate information. There is a Wikipedia page for Misinformation related to the 2019-20 coronavirus pandemic. Several instances of misinformation have been recorded here.
    If the information you got is via a social media platform such as Whatsapp, you should be careful about its authenticity. Always ask the sender for the origin of the message if you are unsure if it is true. Encourage everyone to share trusted information only.
  2. Extraordinary claims need extraordinary evidence : If you find a post that says that the the cure for coronavirus disease is found, or makes similar tall claims, it is likely that they are wrong. If a vaccine or medicine for coronavirus gets indeed made, it will be all over the place, not just in that single forwarded message.
  3. If you find something like “The truth behind coronavirus pandemic” or such that has the word ‘truth’ in it, it is likely that they are sharing an unpopular opinion, and therefore, it is likely to be false. Those saying the truth don’t need to affirm that they are indeed saying the truth, but liars need to do it from time to time to make sure their lies are spreading.
  4. If the coronavirus-related post deals with supporting an ideology or a religion, it may be false. In the zeal of projecting one’s ideology or religion first, people tend to create and spread all kinds of news, including fake ones. Neither capitalism or communism has figured out how to control coronavirus spread. Neither Hinduism nor Islam has solutions for preventing disease transmission.
  5. Take extra care when you SHARE information. Only share the posts that you know are true. Don’t be a part of the fake news chain.

Gender Gap in Wikipedia’s content

Only 15% of all biographies on English Wikipedia belong to women. Women and men are portrayed differently on Wikipedia in terms of article structure, the use of infoboxes, network properties, notability etc. This research project is aimed at mapping the gender gap on Wikipedia in terms of its content. This work is done as a followup report to my presentation at WikiWomenCamp 2017. The aim is to create a review of peer-reviewed research papers on gender gap on Wikipedia’s content.

Methods

  • Find all relevant articles for the analysis using Google scholar. Keywords used are ‘Wikipedia’, ‘gender’, ‘content’, ‘women’, ‘bias’ and various relevant combinations of these words.
  • Screen the title and abstract to include only those studies that fit the inclusion criteria. Further screening for content to only include the studies about gender gap in Wikipedia’s content.
  • Assess the validity and reliability of the results
  • Systematic presentation of the findings

Rosiestep_at_WikiDivCon_2017
Rosie Stephenson Goodnight has worked extensively on bridging the gender gap in English Wikipedia. Photo: Camelia.boban, CC-BY-SA 4.0, Wikimedia Commons

Results

The results were summerized under four categories :

  • Coverage bias : Coverage bias occurs when men and women are covered differently on Wikipedia. For example, the coverage bias may manifest as differences in the number of notable women and men portrayed on Wikipedia.
Research Data Methods Findings
Wagner et al [1] Wikipedia in 6 language editions Wikipedia in 6 languages compared to several datasets: Freebase, Pantheon, Human Accomplishment, crawled the content of articles about people in the reference datasets using Wikipedia’s API (November 2014). Men and women are covered equally well on Wikipedia and articles about women tend to be longer than articles about men on Wikipedia, when compared to those from the reference datasets.
Graells-Garrido et al [2] The DBPedia 2014 dataset, The Wikipedia English Dump of October 2014 The DBPedia and Wikipedia data dump were analysed for metadata properties. The gender of a biography, whenever not mentioned, was determined by ‘inferred gender for Wikipedia biographies’ (Bamman and Smith) 15% of articles in ‘Person class’ were about women. In comparison to the global proportion of women, the categories that over-represent women are Artist, Royalty, FictionalCharacter, Noble, BeautyQueen, and Model.
Reagle & Rhue [3] Biographical subjects from several sources (100 Most Influentiial figures in American History, TIME magazine’s list of 2008’s most influential people, Chambers Biographical dictionary, American National Biography Online) compared to English Wikipedia and Britannica. A Python program was used to compare web pages related to the subjects targeted in the reference sources. Google API was queried for top four results. Gender was guessed by the balance of gendered pronouns (she, her, he, his). The length of an article is determined by the words of article content and does not include citations and other miscellany. Wikipedia provides better coverage and longer articles on women than Britannica. Wikipedia has more articles about women than Britannica in absolute terms, but articles about women on Wikipedia are more likely to be missing than articles about men compared to Britannica.
Wagner et al [4] DBPedia 2014 dataset, inferred gender for Wikipedia bios Calculated the number of language editions in which per biography is represented and google search volume of women’s bio, compared them with Wikipedia articles Women in Wikipedia are more notable than men, which the authors interpret as the outcome of a subtle glass ceiling effect.
  • Structural bias : Structural bias refers to preferential use of gender-specific tendencies while connecting articles on notable people. For example, there may be more links to men’s biographies on articles related to women.

WikiGap_sthlm
WikiGap is a program dedicated to closing the content gap on Wikimedia.

Research Data Methods Findings
Wagner et al [1] Wikipedia in 6 language editions Wikipedia’s API (November 2014), analysed for probability that a link from article with gender g1 ends in an article with gender g2. Articles about women connect less to articles about men via interlinks. Articles about people with the same gender tend to link to each other. Articles about women tend to link more to articles about men than the opposite. Men are more central than women in English, Russian and German language Wikipedia.
Graells-Garrido et al [2] The DBPedia 2014 dataset, The Wikipedia English Dump of October 2014 Proportion of links from gender to gender was calculated and tested against expected proportions. Analysed distribution of PageRank by gender to understand centrality. Women biographies tend to link more to other women than to men. The article with highest centrality tend to be predominantly about men, beyond what one could expect from the structure of the network.
Wagner et al [4] DBPedia 2014 dataset, inferred gender for Wikipedia bios, attributes, PageRank Explored to what extent the connectivity between people is influenced by gender. Investigated the relation between the centrality of people and their gender using PageRank. The top-ranked women according to PageRank are slightly less central than men, and the centrality of women decreases faster than that of men with decreasing rank. There exists a bias in the generation of links by Wikipedia editors, favoring articles about men.
  • Lexical bias : Lexical bias refers to the inequalities in the terms used to describe men and women on Wikipedia. For example, the articles about women are more likely to have details about their family life.
Research Data Methods Findings
Wagner et al [1] Wikipedia in 6 language editions Open vocabulary approach where classifier determines which words are most effective in distinguishing the gender of the person an article is about. Log likelihood ratios are used for comparing different feature-outcome relationships. There is lower salience of male-related words in articles about men, which can be related to the idea of male as the null gender (there is a social bias to assume male as the standard gender in certain social situations). Words like “married”, “divorced”, “children” or “family” are much more frequently used in articles about women. This study confirms that men and women are presented differently on Wikipedia and that those differences go beyond what we would expect due to the history of gender inequalities.
Graells-Garrido et al [2] The DBPedia 2014 dataset, The Wikipedia English Dump of October 2014, Linguistic Inquiry and Word count (LIWC) dictionary To explore which words are more strongly associated with each gender, Pointwise Mutual Information is measured over the set of vocabulary in both genders. Also considered burstiness, a measure of word importance in a single document according to the number of times it appears within the document, under the assumption that important words appear more than once (they appear in bursts) when they are relevant in a given document. Marriage and sex-related content are more frequent in women’s biographies and cognition related content is highlighted in men’s biographies. Words most associated with men are mostly about sports, while the words most associated with women are to arts, gender and family. Of particular interest are two concepts strongly associated with women: her husband and first woman.
Wagner et al [4] Overview of English Wikipedia biographies, inferred gender for Wikipedia bios Analysed gender topic, relationship topic and family topic in Wikipedia’s biographies. Quantified the tendency of expressing positive and negative aspects of biographies with adjectives, as a measure of the degree of abstraction of positive and negative content. Family-, Gender-, and relationship-related topics are more present in biographies about women, linguistic bias manifests in Wikipedia since abstract terms tend to be used to describe positive aspects in the biographies of men and negative aspects in the biographies of women.
  • Visibility bias: Visibility bias occurs when articles related to men and women are differently promoted within Wikipedia. For example, men’s biographies are potentially more likely to be featured articles than women’s biographies, although the difference is not significant.
Research Data Methods Findings
Wagner et al [1] Wikipedia in 6 language editions Proportion of women’s biographies that make it to the main page of Wikipedia Selection procedure of featured articles of Wikipedia community does not suffer from gender bias.

 

Group_photo_of_women_wikipedian_with_Katherine_Maher_at_WikiConference_India_2016,_6_August_2016_3
Women Wikimedians at WikiConference, India 2016. According to a 2011 survey, only 3% of Indian Wikimedians were women. Photo: Afifa Afrin, CC-BY-SA, Wikimedia Commons

References

  1. ↑ a b c d It’s a man’s Wikipedia? Assessing Gender Inequality in an online Encyclopedia Wagner, Claudia; Garcia, David; Jadidi, Mohsen; Strohmaier, Markus (May 2015). “It’s a man’s Wikipedia? Assessing Gender Inequality in an online Encyclopedia”Proceedings of the Ninth International AAAI Conference on Web and Social Media. Retrieved 28 July 2017.
  2. ↑ a b c Graelles-Garrido, Eduardo; Lalmas, Mounia; Menczer, Filippo (September 2015). “First Women, Second Sex : Gender Bias in Wikipedia”Social and Information Networks. Retrieved 28 July 2017.
  3.  Reagle, Joseph; Rhue, Lauren (2011). “Gender bias in Wikipedia and Britannica”International Journal of Communication S: 1138–1158. Retrieved 28 July 2017.
  4. a b c Wagner, Claudia; Graelles-Garrido, Eduardo; Garcia, David; Menczer, Filippo (2016). “Women through the glass ceiling: gender asymmetries in Wikipedia” (PDF). EPJ Data Science. Retrieved 30 July 2017.

 

The same article can be found on meta-wiki here. A longer presentation containing information regarding gender gap research on Wikipedia can be found here.

Featured photo courtesy: Martina Cora, CC-BY-SA 4.0, Wikimedia Commons