Guided Practice on MongoDB Atlas

In this document we will work on MongoDB Atlas, a hosted MongoDB service to create a database, load documents and execute queries.

In a second part, we will see how to work on larger databases with mongosh in the terminal.

At the end of this session, you should have a good understanding of MongoDB’s query language (MQL).

Objectives:

Working with MongoDB

There are several ways to work with MongoDB

On your local machine

The MongoDB Shell, mongosh, is a JavaScript and Node.js REPL (Read-Eval-Print Loop) environment for interacting with MongoDB servers.

You can also work on MongoDB with your preferred scripting language: python, go, ruby, PHP, Java, etc…

Today, we are working on the hosted Atlas version.

MongoDB Atlas gives a hosted cluster on which we can create a database, import sample collections and understand how to perform CRUD operations in MongoDB.

CRUD means Create, Read, Update and Delete: the four basic operations for a DBMS

Atlas

tldr; We’re going to

  1. create a cluster
  2. create a database
  3. add data.

The trick is finding where to click on the UI. Then we’ll explore the pre-loaded data sample that has more data.

Account and Cluster

Let’s start by creating an account on Atlas https://www.mongodb.com/cloud/atlas/register

⏱️⏱️⏱️

Then:

Create a cluster

On the next screen

You can close the “Connect popup”. We’ll come back to it later.

Important

add IP address

Create a database

Then create a MongoDB database

or in:

with following info

Note: do not confuse

Then insert the following Document (right side : INSERT DOCUMENT)

you can delete the existing _id

{
  "_id":{
    "$oid":"6745ab6f0e0bbdab062667c7"
    },
    "title": "Happy",
    "artist": "Pharrell Williams",
    "year": 2013,
    "mood": "joyful"
}

insert a document

And insert another one

{
  "_id":{
    "$oid":"682edb5c5ee7135cebb21b2f"
  },
  "title": "Galvanize",
  "artist": "The Chemical Brothers",
  "year": 2004,
  "mood": "electric"
}

In both cases, the value of the _id is arbitrary. The tool proposes a default one with another value.

We now have 2 documents!

2 documents, WOOOOOO! exciting 🤣🤣🤣 !

Let’s explore this magnificent MongoDB database.

Queries in MongoDB

In MongoDB, writing a query comes down to writing JSON

json query
{} returns all documents
{ field : value } where field = value
{ field : { $lt : value } } where field <= value (lt : less than)

To return all the documents in our songs collection, just write {} in the “Filter” query field

This returns the 2 songs, the 2 documents.

{} returns the whole dataset

And the query {year : {$lt : 2010}} returns the song that was published before 2010.

{ year : {$lt : 2010} } published before 2010.

Moaaaaaarrrr data!!!

We want more data

2 documents is good but we need more data to really play.

Let’s now import the sample datasets provided by ATLAS.

Note: it may be already loaded. Check in Clusters > Browse Collections.

If you do not see the database sample_mflix in the collection list then load it.

Go to Clusters > dots > load sample dataset

load sample dataset

Keep everything or just select sample_mflix (may be faster)

⏳⏳⏳

Then click on Clusters) > Browse collections to see the available databases.

You get a new database movies_mflix with 5 collections.

Atlas sample databases

Let’s look at the embedded_movies collection which contains 1525 documents and observe the structure of a document

A document is a JSON record

It can have:

The primary key of a collection is always “_id”.

{
    "_id": {
        "$oid": "573a1390f29313caabcd5293"
    },
    "plot": "Young Pauline is left a lot of money when her wealthy uncle dies. However, her uncle's secretary has been named as her guardian until she marries, at which time she will officially take ...",
    "genres": [
        "Action"
    ],
    "runtime": {
        "$numberInt": "199"
    },
    "cast": [
        "Pearl White",
        "Crane Wilbur",
        "Paul Panzer",
        "Edward Josè"
    ],
    "num_mflix_comments": {
        "$numberInt": "0"
    },
    "poster": "https://m.media-amazon.com/images/M/MV5BMzgxODk1Mzk2Ml5BMl5BanBnXkFtZTgwMDg0NzkwMjE@._V1_SY1000_SX677_AL_.jpg",
    "title": "The Perils of Pauline",
    "fullplot": "Young Pauline is left a lot of money when her wealthy uncle dies. However, her uncle's secretary has been named as her guardian until she marries, at which time she will officially take possession of her inheritance. Meanwhile, her \"guardian\" and his confederates constantly come up with schemes to get rid of Pauline so that he can get his hands on the money himself.",
    "languages": [
        "English"
    ],
    "released": {
        "$date": {
            "$numberLong": "-1760227200000"
        }
    },
    "directors": [
        "Louis J. Gasnier",
        "Donald MacKenzie"
    ],
    "writers": [
        "Charles W. Goddard (screenplay)",
        "Basil Dickey (screenplay)",
        "Charles W. Goddard (novel)",
        "George B. Seitz",
        "Bertram Millhauser"
    ],
    "awards": {
        "wins": {
            "$numberInt": "1"
        },
        "nominations": {
            "$numberInt": "0"
        },
        "text": "1 win."
    },
    "lastupdated": "2015-09-12 00:01:18.647000000",
    "year": {
        "$numberInt": "1914"
    },
    "imdb": {
        "rating": {
            "$numberDouble": "7.6"
        },
        "votes": {
            "$numberInt": "744"
        },
        "id": {
            "$numberInt": "4465"
        }
    },
    "countries": [
        "USA"
    ],
    "type": "movie",
    "tomatoes": {
        "viewer": {
            "rating": {
                "$numberDouble": "2.8"
            },
            "numReviews": {
                "$numberInt": "9"
            }
        },
        "production": "Pathè Frères",
        "lastUpdated": {
            "$date": {
                "$numberLong": "1441993579000"
            }
        }
    }
}

The ATLAS user interface is well done but I prefer working with code rather than a web interface.

Let’s move to mongosh to explore this movie database and learn how to make queries in MongoDB.

Connecting via your language

Atlas allows you to connect to your cluster either with a driver for your favorite language (python, …) or with the shell.

connect to your cluster

Connecting with Mongosh

Save the connection string in an environment variable

mongodb+srv://<username>:<password>@<cluster_id>.mongodb.net/

For example

export MONGO_ATLAS_URI=mongodb+srv://<username>:<password>@<cluster_id>.mongodb.net/

Then connect with

mongosh ${MONGO_ATLAS_URI}

Connecting in Python

We need the pymongo package

pip install pymongo

The connection string is

connection_string = "mongodb+srv://<username>:<db_password>@<cluster_id>.mongodb.net

Note: If you work in python, it’s better to put the connection string as an environment variable ($MONGO_ATLAS_URI), in a .env file. then load it with dotenv

import os
from dotenv import load_dotenv

load_dotenv()

MONGO_ATLAS_URI = os.getenv('MONGO_ATLAS_URI')

And then instantiate the client with

from pymongo import MongoClient

client = MongoClient(MONGO_ATLAS_URI)

Once we have a client, we can connect to the database

db = client["sample_mflix"]

then instantiate a collection

collection = db["movies"]

The collection is of class pymongo.synchronous.collection.Collection and has many methods:

many collections methods

You cna then work directly in a python script file, in a Jupyter notebook (Google Colab, local) or in your terminal with iPython.

Using the terminal with mongosh

How to install mongosh: https://www.mongodb.com/docs/mongodb-shell/install/

The language used in the MongoDB shell (mongosh) is JavaScript.

mongosh is an interactive JavaScript interface for MongoDB, allowing you to interact with your MongoDB instances, execute queries or perform administrative tasks in JavaScript.

mongosh also supports many JavaScript features including the use of variables, loops and functions.

Here are some examples of using JavaScript in mongosh:

Local

mongosh "mongodb://localhost:27017"

On Atlas

mongosh ${MONGO_ATLAS_URI}

where

export MONGO_ATLAS_URI=mongodb+srv://<username>:<password>@<cluster_id>.mongodb.net/

Once connected to an instance

show dbs

returns

Atlas atlas-a12t6r-shard-0 [primary] test> show dbs
sample_mflix  126.52 MiB
songsdb        72.00 KiB
admin         348.00 KiB
local           3.20 GiB
use sample_mflix

!! Attention !! If you try to use a database that does not exist, no error is raised.

show collections

list of operations / methods on a database

list of operations / methods on a collection

db.songs.insertOne(
    {
        "title": "Green Onions",
      "artist": "Booket T and the MGs",
      "year": 1967,
      "mood": "[blues, soul]"
  }
)
db.users.updateOne(
{ "artist": "Booket T and the MGs" },
{ $set: { "artist": "Booker T and the MGs" } }
);
db.songs.find({ year: { $lt: 2010 } })
db.songs.deleteOne({ title: "Green Onions" })

Work with the whole javascript language in mongosh

You can set variables for instance

var  max_year= 2010;
db.songs.find({ year: { $lt: max_year }})
mongosh < script.js

or

mongosh --file script.js

and within a mongosh session

> load('path/to/script.js')

mongosh is a powerful tool for querying and managing your MongoDB databases.

Let’s switch to the sample_mflix database.

use sample_mflix

then

show collections

Difference between queries in python and javascript

python: double quotes around fields and operators

db.movies.find(
    {"runtime": {"$gt" : 180}},  // Filter on movie duration
    { "_id": 0, "title": 1, "runtime": 1, "imdb.rating": 1 }  // Projection to include title and imdb.rating, exclude _id
)

mongosh: no need for quotes

db.movies.find(
    {runtime: {$gt : 180}},  // Filter on movie duration
    { _id: 0, title: 1, runtime: 1, "imdb.rating": 1 }  // Projection to include title and imdb.rating, exclude _id
)

Filtering - query predicate

https://www.mongodb.com/docs/manual/reference/glossary/

The JSON clause that specifies the filtering arguments is called a query predicate.

A query predicate returns a boolean that indicates whether a document matches the specified query.

Main functions on collections

function returns
find() all documents
find_one() the 1st document
distinct("<field>") list of distinct values for the <field>
count_documents({}) number of documents for the collection or returned by the filter in the query predicate

Also note

You can chain these methods with limit and sort

db.movies.find({runtime: {$gt: 120}}).limit(3)

Cursor

The result of a find query is a cursor.

cursor = db.movies.find({})

A cursor is a pointer to a MongoDB query result set.

Projection

In database language, projecting means selecting a subset of all possible fields.

In SQL, you simply list the column names

select genres, plot from movies;

In MongoDB, you must specify the fields in a JSON object, right after the query predicate

db.movies.find(
    {runtime: {$gt : 180}},  // Filter on movie duration
    { _id: 0, title: 1, runtime: 1, "imdb.rating": 1 }  // Projection to include title and imdb.rating, exclude _id
)

Here the projection is expressed by: ({ _id: 0, title: 1, runtime: 1, "imdb.rating": 1 })

The query returns

{ runtime: 240, title: 'Napoleon', imdb: { rating: 7.4 } },
{ runtime: 281, title: 'Les Misèrables', imdb: { rating: 7.9 } },
{ runtime: 245, title: 'Flash Gordon', imdb: { rating: 7.3 } },
{ runtime: 238, title: 'Gone with the Wind', imdb: { rating: 8.2 } },
...

Number of documents

db.movies.countDocuments({ "imdb.rating": { $gt: 8.0 } })

Notice how we query the embedded fields : “imdb.rating”

Your turn

Write the filter and projection for the following queries

  1. Find movies with an IMDb rating greater than 8
    • filter: {"imdb.rating": {"$gt" : 8}}
    • projection: {"_id": 0, "title": 1, "imdb.rating": 1}
  2. Movies released after 2000
  3. Movies with a specific director: “Christopher Nolan”. Show title, director and year
  4. Retrieve movies with tomatoes.viewer.rating > 4.0, showing title and viewer rating.
  5. Find movies that contain "Comedy" and "Drama" in the genres array. Use {$all: [list of genres]}
  6. Find movies that contain "Comedy" OR "Drama" in the genres array. Use {$all: [list of genres]}
  7. Movie query on a year range: Retrieve movies released between 1990 and 2000, showing title and year.
  8. Movie query with missing fields: Find movies where the fullplot field does not exist. Use $exists.
  9. Find all distinct genres
    • use db.movies.distinct("genres")
  10. top 5 most frequent genres
    • use the $unwind operator in combination withe $group operator
       {"$unwind": "$genres"},
       {"$group": {"_id": "$genres", "count": {"$sum": 1}}},
      

Conclusion

In this session you practiced:

In the next session, we will dive deeper into MongoDB and look at more complex ways to query data using aggregation pipelines. We will also cover schema design and validation.

To go further

For next time, you can:

and many more