After doing some basic CRUD operations in my previous post, we’ll now have a look at little more advanced queries.

First we want to match all countries that have Turkish as official language. Let’s insert two countries where Turkish is spoken to a new collection:

> use advanced_query_db
> db.countries.insert(
    {name: "Germany", languages: [{name: "German", note: "official"}, {name: "Turkish"}]})
> db.countries.insert(
    {name: "Turkey", languages: [{name: "Turkish", note: "official"}, {name: "Kurdish"}]})

Now we have two countries inserted, Germany and Turkey. Now if we want to get the name of all countries with Turkish as official language, we can try following query:

> db.countries.find(
    {"languages.name": "Turkish", "languages.note": "official"},
    {"name": 1, "_id": 0}
  )
{ "name" : "Germany" }
{ "name" : "Turkey" }

Surprise: it returns Germany as well! So let’s have a closer look:

  1. we can access nested property by chaining them with a dot. This seemed to work.
  2. we search for all documents where we have a language with a name “Turkish”
  3. we search for all documents where we have a language with note “official”

The problem is: we never specified that 2. and 3. must happen on the same element of the language array, that’s why the query matches for Germany as we have a language with note “official” and a language with name “Turkish” in the languages array.

So we need to specify that the two conditions must match on the same query element. For this Mongo has an operator called “$elemMatch”. Using it in our query from above results in the following:

> db.countries.find(
    {"languages":
        { $elemMatch: {"name": "Turkish", "note": "official"} }
    },
    {"name": 1, "_id": 0})
{ "name" : "Turkey" }

This time the result is what we wanted. Note that as we have a condition on the attribute languages attribute directly, we don’t chain the attributes anymore.

Next we will add a few more countries to make our next query more interesting:

> db.countries.insertMany([
    {name: "Austria", languages: [
        {name: "German", note: "official"}]},
    {name: "Switzerland", languages: [
        {name: "German", note: "official"},
        {name: "French", note: "official"},
        {name: "Italian", note: "official"},
        {name: "Rumantsch", note: "official"}]},
    {name: "France", languages: [
        {name: "French", note: "official"}]}])
{
	"acknowledged" : true,
	"insertedIds" : [
		ObjectId("5b81c0e1e8bf539db5039fd7"),
		ObjectId("5b81c0e1e8bf539db5039fd8"),
		ObjectId("5b81c0e1e8bf539db5039fd9")
	]
}

Now that we have more countries, we want to return a list of all languages together with the countries in which they are an official language. For this we will call “aggregate” instead of “find” to the collection. With this function we can perform a sequence of of pipeline stages on the collection. You can find the full list of pipeline steps here.

As we can have more than one language, we first need an unwind statement, so that we get a single document for every language:

> db.countries.aggregate([
   {$unwind: "$languages"}
  ])
{ "_id" : ObjectId("5b81baaae8bf539db5039fd5"), "name" : "Germany", "languages" : { "name" : "German", "note" : "official" } }
{ "_id" : ObjectId("5b81baaae8bf539db5039fd5"), "name" : "Germany", "languages" : { "name" : "Turkish" } }
{ "_id" : ObjectId("5b81baf6e8bf539db5039fd6"), "name" : "Turkey", "languages" : { "name" : "Turkish", "note" : "official" } }
{ "_id" : ObjectId("5b81baf6e8bf539db5039fd6"), "name" : "Turkey", "languages" : { "name" : "Kurdish" } }
{ "_id" : ObjectId("5b81c0e1e8bf539db5039fd7"), "name" : "Austria", "languages" : { "name" : "German", "note" : "official" } }
{ "_id" : ObjectId("5b81c0e1e8bf539db5039fd8"), "name" : "Switzerland", "languages" : { "name" : "German", "note" : "official" } }
{ "_id" : ObjectId("5b81c0e1e8bf539db5039fd8"), "name" : "Switzerland", "languages" : { "name" : "French", "note" : "official" } }
{ "_id" : ObjectId("5b81c0e1e8bf539db5039fd8"), "name" : "Switzerland", "languages" : { "name" : "Italian", "note" : "official" } }
{ "_id" : ObjectId("5b81c0e1e8bf539db5039fd8"), "name" : "Switzerland", "languages" : { "name" : "Rumantsch", "note" : "official" } }
{ "_id" : ObjectId("5b81c0e1e8bf539db5039fd9"), "name" : "France", "languages" : { "name" : "French", "note" : "official" } }

Now we have one document per language, we can filter for official languages. We do this with a $match pipeline step

> db.countries.aggregate([
    {$unwind: "$languages"},
    {$match: {"languages.note": "official"}}
  ])
{ "_id" : ObjectId("5b81baaae8bf539db5039fd5"), "name" : "Germany", "languages" : { "name" : "German", "note" : "official" } }
{ "_id" : ObjectId("5b81baf6e8bf539db5039fd6"), "name" : "Turkey", "languages" : { "name" : "Turkish", "note" : "official" } }
{ "_id" : ObjectId("5b81c0e1e8bf539db5039fd7"), "name" : "Austria", "languages" : { "name" : "German", "note" : "official" } }
{ "_id" : ObjectId("5b81c0e1e8bf539db5039fd8"), "name" : "Switzerland", "languages" : { "name" : "German", "note" : "official" } }
{ "_id" : ObjectId("5b81c0e1e8bf539db5039fd8"), "name" : "Switzerland", "languages" : { "name" : "French", "note" : "official" } }
{ "_id" : ObjectId("5b81c0e1e8bf539db5039fd8"), "name" : "Switzerland", "languages" : { "name" : "Italian", "note" : "official" } }
{ "_id" : ObjectId("5b81c0e1e8bf539db5039fd8"), "name" : "Switzerland", "languages" : { "name" : "Rumantsch", "note" : "official" } }
{ "_id" : ObjectId("5b81c0e1e8bf539db5039fd9"), "name" : "France", "languages" : { "name" : "French", "note" : "official" } }

Now we only have only countries with one of their official languages. The next step is to add an attribute called “language” which contains the language name. This will replace our current nested “languages” attribute. We also want a the “name” attribute to be called country, so we add this field as well

> db.countries.aggregate([
    {$unwind: "$languages"},
    {$match: {"languages.note": "official"}},
    {$addFields:{"language": "$languages.name", "country": "$name"}}
  ])
{ "_id" : ObjectId("5b81baaae8bf539db5039fd5"), "name" : "Germany", "languages" : { "name" : "German", "note" : "official" },
    "language" : "German", "country" : "Germany" }
{ "_id" : ObjectId("5b81baf6e8bf539db5039fd6"), "name" : "Turkey", "languages" : { "name" : "Turkish", "note" : "official" },
    "language" : "Turkish", "country" : "Turkey" }
{ "_id" : ObjectId("5b81c0e1e8bf539db5039fd7"), "name" : "Austria", "languages" : { "name" : "German", "note" : "official" },
    "language" : "German", "country" : "Austria" }
{ "_id" : ObjectId("5b81c0e1e8bf539db5039fd8"), "name" : "Switzerland", "languages" : { "name" : "German", "note" : "official" },
    "language" : "German", "country" : "Switzerland" }
{ "_id" : ObjectId("5b81c0e1e8bf539db5039fd8"), "name" : "Switzerland", "languages" : { "name" : "French", "note" : "official" },
    "language" : "French", "country" : "Switzerland" }
{ "_id" : ObjectId("5b81c0e1e8bf539db5039fd8"), "name" : "Switzerland", "languages" : { "name" : "Italian", "note" : "official" },
    "language" : "Italian", "country" : "Switzerland" }
{ "_id" : ObjectId("5b81c0e1e8bf539db5039fd8"), "name" : "Switzerland", "languages" : { "name" : "Rumantsch", "note" : "official" },
    "language" : "Rumantsch", "country" : "Switzerland" }
{ "_id" : ObjectId("5b81c0e1e8bf539db5039fd9"), "name" : "France", "languages" : { "name" : "French", "note" : "official" },
    "language" : "French", "country" : "France" }

As we only want to see the “language” and “country” attributes, we add a projection step next:

> db.countries.aggregate([
    {$unwind: "$languages"},
    {$match: {"languages.note": "official"}},
    {$addFields:{"language": "$languages.name", "country": "$name"}},
    {$project: {"language": 1, "country": 1, "_id": 0}}
  ])
{ "language" : "German", "country" : "Germany" }
{ "language" : "Turkish", "country" : "Turkey" }
{ "language" : "German", "country" : "Austria" }
{ "language" : "German", "country" : "Switzerland" }
{ "language" : "French", "country" : "Switzerland" }
{ "language" : "Italian", "country" : "Switzerland" }
{ "language" : "Rumantsch", "country" : "Switzerland" }
{ "language" : "French", "country" : "France" }

Finally we need to group by country using the $group step.

> db.countries.aggregate([
    {$unwind: "$languages"},
    {$match: {"languages.note": "official"}},
    {$addFields:{"language": "$languages.name", "country": "$name"}},
    {$project: {"language": 1, "country": 1, "_id": 0}},
    {$group:{_id: "$language", countries: {$push: "$country"}}}
  ])
{ "_id" : "Rumantsch", "countries" : [ "Switzerland" ] }
{ "_id" : "German", "countries" : [ "Germany", "Austria", "Switzerland" ] }
{ "_id" : "French", "countries" : [ "Switzerland", "France" ] }
{ "_id" : "Italian", "countries" : [ "Switzerland" ] }
{ "_id" : "Turkish", "countries" : [ "Turkey" ] }

In a group by, you have to specify the attributes on which to group using the _id attribute. The other attributes specify how we do the aggregation. Here we used the $push aggregation operator which adds the attribute value to an array. Now we have a final result. Of course it can be made more beautiful, by doing renaming the _id field to language, but I’ll leave this as extra exercise to you as we already did something similar before.

Now you saw how to do a bit more complex query and especially the power of the aggregate function.

Actually we did some map reduce here: with the steps $unwind, $addFields and $project we did the map part and the group step was the reduce part, so we’ve just learned how MapReduce is done in MongoDB. Note that MongoDB also has explicit map and reduce functions, but it is usually better to use aggregation functions.