Gender Gap in Wikipedia’s content

Only 15% of all biographies on English Wikipedia belong to women. Women and men are portrayed differently on Wikipedia in terms of article structure, the use of infoboxes, network properties, notability etc. This research project is aimed at mapping the gender gap on Wikipedia in terms of its content. This work is done as a followup report to my presentation at WikiWomenCamp 2017. The aim is to create a review of peer-reviewed research papers on gender gap on Wikipedia’s content.

Methods

  • Find all relevant articles for the analysis using Google scholar. Keywords used are ‘Wikipedia’, ‘gender’, ‘content’, ‘women’, ‘bias’ and various relevant combinations of these words.
  • Screen the title and abstract to include only those studies that fit the inclusion criteria. Further screening for content to only include the studies about gender gap in Wikipedia’s content.
  • Assess the validity and reliability of the results
  • Systematic presentation of the findings
Rosiestep_at_WikiDivCon_2017
Rosie Stephenson Goodnight has worked extensively on bridging the gender gap in English Wikipedia. Photo: Camelia.boban, CC-BY-SA 4.0, Wikimedia Commons

Results

The results were summerized under four categories :

  • Coverage bias : Coverage bias occurs when men and women are covered differently on Wikipedia. For example, the coverage bias may manifest as differences in the number of notable women and men portrayed on Wikipedia.
Research Data Methods Findings
Wagner et al [1] Wikipedia in 6 language editions Wikipedia in 6 languages compared to several datasets: Freebase, Pantheon, Human Accomplishment, crawled the content of articles about people in the reference datasets using Wikipedia’s API (November 2014). Men and women are covered equally well on Wikipedia and articles about women tend to be longer than articles about men on Wikipedia, when compared to those from the reference datasets.
Graells-Garrido et al [2] The DBPedia 2014 dataset, The Wikipedia English Dump of October 2014 The DBPedia and Wikipedia data dump were analysed for metadata properties. The gender of a biography, whenever not mentioned, was determined by ‘inferred gender for Wikipedia biographies’ (Bamman and Smith) 15% of articles in ‘Person class’ were about women. In comparison to the global proportion of women, the categories that over-represent women are Artist, Royalty, FictionalCharacter, Noble, BeautyQueen, and Model.
Reagle & Rhue [3] Biographical subjects from several sources (100 Most Influentiial figures in American History, TIME magazine’s list of 2008’s most influential people, Chambers Biographical dictionary, American National Biography Online) compared to English Wikipedia and Britannica. A Python program was used to compare web pages related to the subjects targeted in the reference sources. Google API was queried for top four results. Gender was guessed by the balance of gendered pronouns (she, her, he, his). The length of an article is determined by the words of article content and does not include citations and other miscellany. Wikipedia provides better coverage and longer articles on women than Britannica. Wikipedia has more articles about women than Britannica in absolute terms, but articles about women on Wikipedia are more likely to be missing than articles about men compared to Britannica.
Wagner et al [4] DBPedia 2014 dataset, inferred gender for Wikipedia bios Calculated the number of language editions in which per biography is represented and google search volume of women’s bio, compared them with Wikipedia articles Women in Wikipedia are more notable than men, which the authors interpret as the outcome of a subtle glass ceiling effect.
  • Structural bias : Structural bias refers to preferential use of gender-specific tendencies while connecting articles on notable people. For example, there may be more links to men’s biographies on articles related to women.
WikiGap_sthlm
WikiGap is a program dedicated to closing the content gap on Wikimedia.
Research Data Methods Findings
Wagner et al [1] Wikipedia in 6 language editions Wikipedia’s API (November 2014), analysed for probability that a link from article with gender g1 ends in an article with gender g2. Articles about women connect less to articles about men via interlinks. Articles about people with the same gender tend to link to each other. Articles about women tend to link more to articles about men than the opposite. Men are more central than women in English, Russian and German language Wikipedia.
Graells-Garrido et al [2] The DBPedia 2014 dataset, The Wikipedia English Dump of October 2014 Proportion of links from gender to gender was calculated and tested against expected proportions. Analysed distribution of PageRank by gender to understand centrality. Women biographies tend to link more to other women than to men. The article with highest centrality tend to be predominantly about men, beyond what one could expect from the structure of the network.
Wagner et al [4] DBPedia 2014 dataset, inferred gender for Wikipedia bios, attributes, PageRank Explored to what extent the connectivity between people is influenced by gender. Investigated the relation between the centrality of people and their gender using PageRank. The top-ranked women according to PageRank are slightly less central than men, and the centrality of women decreases faster than that of men with decreasing rank. There exists a bias in the generation of links by Wikipedia editors, favoring articles about men.
  • Lexical bias : Lexical bias refers to the inequalities in the terms used to describe men and women on Wikipedia. For example, the articles about women are more likely to have details about their family life.
Research Data Methods Findings
Wagner et al [1] Wikipedia in 6 language editions Open vocabulary approach where classifier determines which words are most effective in distinguishing the gender of the person an article is about. Log likelihood ratios are used for comparing different feature-outcome relationships. There is lower salience of male-related words in articles about men, which can be related to the idea of male as the null gender (there is a social bias to assume male as the standard gender in certain social situations). Words like “married”, “divorced”, “children” or “family” are much more frequently used in articles about women. This study confirms that men and women are presented differently on Wikipedia and that those differences go beyond what we would expect due to the history of gender inequalities.
Graells-Garrido et al [2] The DBPedia 2014 dataset, The Wikipedia English Dump of October 2014, Linguistic Inquiry and Word count (LIWC) dictionary To explore which words are more strongly associated with each gender, Pointwise Mutual Information is measured over the set of vocabulary in both genders. Also considered burstiness, a measure of word importance in a single document according to the number of times it appears within the document, under the assumption that important words appear more than once (they appear in bursts) when they are relevant in a given document. Marriage and sex-related content are more frequent in women’s biographies and cognition related content is highlighted in men’s biographies. Words most associated with men are mostly about sports, while the words most associated with women are to arts, gender and family. Of particular interest are two concepts strongly associated with women: her husband and first woman.
Wagner et al [4] Overview of English Wikipedia biographies, inferred gender for Wikipedia bios Analysed gender topic, relationship topic and family topic in Wikipedia’s biographies. Quantified the tendency of expressing positive and negative aspects of biographies with adjectives, as a measure of the degree of abstraction of positive and negative content. Family-, Gender-, and relationship-related topics are more present in biographies about women, linguistic bias manifests in Wikipedia since abstract terms tend to be used to describe positive aspects in the biographies of men and negative aspects in the biographies of women.
  • Visibility bias: Visibility bias occurs when articles related to men and women are differently promoted within Wikipedia. For example, men’s biographies are potentially more likely to be featured articles than women’s biographies, although the difference is not significant.
Research Data Methods Findings
Wagner et al [1] Wikipedia in 6 language editions Proportion of women’s biographies that make it to the main page of Wikipedia Selection procedure of featured articles of Wikipedia community does not suffer from gender bias.

 

Group_photo_of_women_wikipedian_with_Katherine_Maher_at_WikiConference_India_2016,_6_August_2016_3
Women Wikimedians at WikiConference, India 2016. According to a 2011 survey, only 3% of Indian Wikimedians were women. Photo: Afifa Afrin, CC-BY-SA, Wikimedia Commons

References

  1. ↑ a b c d It’s a man’s Wikipedia? Assessing Gender Inequality in an online Encyclopedia Wagner, Claudia; Garcia, David; Jadidi, Mohsen; Strohmaier, Markus (May 2015). “It’s a man’s Wikipedia? Assessing Gender Inequality in an online Encyclopedia”Proceedings of the Ninth International AAAI Conference on Web and Social Media. Retrieved 28 July 2017.
  2. ↑ a b c Graelles-Garrido, Eduardo; Lalmas, Mounia; Menczer, Filippo (September 2015). “First Women, Second Sex : Gender Bias in Wikipedia”Social and Information Networks. Retrieved 28 July 2017.
  3.  Reagle, Joseph; Rhue, Lauren (2011). “Gender bias in Wikipedia and Britannica”International Journal of Communication S: 1138–1158. Retrieved 28 July 2017.
  4. a b c Wagner, Claudia; Graelles-Garrido, Eduardo; Garcia, David; Menczer, Filippo (2016). “Women through the glass ceiling: gender asymmetries in Wikipedia” (PDF). EPJ Data Science. Retrieved 30 July 2017.

 

The same article can be found on meta-wiki here. A longer presentation containing information regarding gender gap research on Wikipedia can be found here.

Featured photo courtesy: Martina Cora, CC-BY-SA 4.0, Wikimedia Commons

Women at Wikimania 2013

Wikimania 2013, the annual conference of the Wikimedia movement, had the participation of more than 60 women. As of July 30, women accounted 20 percent of online registrations for Wikimania 2013. There was a separate track for ‘Women in Wikimedia’ on Day 2 of the conference. Around 40 women participated in WikiWomen’s Luncheon which happened on the same day.

Organizing team

The Program Committee  of 11 comprised of two women, Katie Filbert and Sarah Stierch. Ellie Young facilitated and supported the organizing team in her capacity as the conference co-ordinator of Wikimedia Foundation. Katie Chan was a member of the scholarship committee of Wikimania 2013.

Keynote by Sue Gardner

The keynote on the final day of the conference was delivered by Sue Gardner, the Executive Director of Wikimedia Foundation. In response to a question from the press, she replied : “I wish we had solved the (gender gap) problem (in Wikimedia), but we didn’t.”

24b06-1149282_10153125521910230_930778489_o
WikiWomen’s Lunch during Wikimania-2013. Sue Gardner,CC-BY-SA.

Wikiwomen’s Luncheon

Wikiwomen’s Luncheon , the luncheon for women attendees of Wikimania 2013, was held on the second day of the conference. Around 40 women participated in the luncheon. Conversation was facilitated by Sue Gardner. Sue told that the participation in the Wikiwomen’s Lunch has rose from 11 in Taipei, 2011 to more than 100 in Washington D.C, 2012.  Gardner observed that when Wikimedia’s editor community is dominated by educated males, and expansion is by word-of-mouth, it will not “naturally grow to be as diverse as it otherwise could have been.”Sarah Stierch, the Program Evaluation Community Coordinator for the Wikimedia Foundation, shared her experiences about volunteering with the Wikimedia Foundation. Staff members of Wikimedia Deucheland passed information and distributed flyers of their upcoming Diversity Conference , which is scheduled to take place in Berlin in November.

Women speakers

Sue Gardner at Wikimedia 2013. By Lvova [CC-BY-SA-3.0], via Wikimedia Commons

 

Talks, panels, pre-conference events and workshops by women speakers were:

  1. Open Street Map Workshop (Katie Filbert)
  2. Women and non-conventional education – a study from Indian cultural context  (Kavya Manohar)
  3. Growing the Arabic Wikipedia through the Wikipedia Education Program  (LiAnna Davis)
  4. Encouraging the creation and development of articles about women in Ibero-America (Ivana Lysholm)
  5. The coolest projects of Wikimedia Chapters – be inspired (Nicole Ebber; together with Lodewijk Gelauff)
  6. Idea Lab Brainstorm (Siko Bouterse & Heather Walls)
  7. Dev Camp (Sumana Harihareswara and others)
  8. Promoting diversity in the German Wikipedia (Ilona Buchem)
  9. Towards bridging the gender gap in Indian Wikimedia Community (Jadine Lannon & Netha Hussain)
  10. Bridging the gender gap with women scientists (Emily Temple Wood)

Women participants in panel discussions were :

  1. Carmen Alcázar and Monica Mora in Wiki Loves Monuments
  2. Sumana Harihareswara in Transparency and Collaboration in Wikimedia Engineering

(This is an incomplete list. If you know a woman speaker at Wikimania 2013, feel free to tell me to add her name here)

Press

1. “Wikipedia fails to bridge gender gap (South China Morning Post, 11 August 2013) by Keira Huang

2. “Women contributors still face hurdles at Wikipedia (The Wall Street Journal, 19 August 2013) by Riva Gold