DHUM 25A43 - Fall 2025 Course Introduction

Wikipedia API Workshop Instructions

Introduction

Imagine you’re a digital anthropologist tasked with studying the world’s major cities through the lens of collective human knowledge. Wikipedia, the modern Library of Alexandria, contains millions of interconnected articles written by people from every corner of the globe. But reading through pages manually would take years!

Today, you’ll learn to harness the Wikipedia API - a powerful tool that lets you programmatically access this vast repository of knowledge. Like a detective gathering clues, you’ll extract data about cities, uncover hidden connections between them, and discover patterns that would be impossible to spot by eye. By the end of this workshop, you’ll have built your own mini research database and gained insights into how these urban centers are documented and connected in our collective digital memory.

Your mission: Use code to explore, analyze, and visualize how Wikipedia represents the world’s great cities. Let’s begin your journey as a data explorer!

good habits

It’s all about building intuition and familiarity with the tools.

Setup

  1. Open Google Colab
  2. Install required library:
!pip install wikipedia-api

Note the ! before pip.

Part 1: Basic Page Retrieval

Task 1: Get a Wikipedia Page

# Import the library
import wikipediaapi

# Instanciate the wiki object
wiki = wikipediaapi.Wikipedia( user_agent="[email protected]",  language='en')

# Try any topic: person, city, country, sports, companies anything
page = wiki.page('Paris')

Task 2: Explore Page Properties

Print and examine these page attributes:

Part 2: Bulk Data Collection

Task 3: Create a List of Topics

For instance list of cities. You can use any topic you want.

cities = ["Paris", "New York", "Tokyo", "London", "Berlin"]
  • ’[’ and ‘]’ are used to create a list.
  • each element is separated by a comma ,.
  • each element is a string, so it’s between double quotes "---".

Task 4: Collect Multiple Data Points

For each city, retrieve:

Part 3: Data Analysis

Task 5: Analyze Your Data

Task 6: Text Processing

Content analysis

Part 4: Save and Visualize

Task 7: Create an Enhanced DataFrame

import pandas as pd

Build a DataFrame with columns: url, title, summary, text_length, link_count, section_count

Task 8: Basic Visualization

Task 9: Export Your Work

More general tasks