MongoDB Aggregation Framework and Mapper-Reducer Program

What is MongoDB?

MongoDB is an open-source document database and leading NoSQL database. MongoDB is written in C++. It is categorized under the NoSQL (Not only SQL) database because the storage and retrieval of data in MongoDB are not in the form of tables.

What is MongoDB Aggregation Framework?

Aggregation operations process data records and return computed results. Aggregation operations group values from multiple documents together and can perform a variety of operations on the grouped data to return a single result.

MongoDB — Map Reduce

In MongoDB, map-reduce is a data processing programming model that helps to perform operations on large data sets and produce aggregated results. MongoDB provides the mapReduce() function to perform the map-reduce operations. This function has two main functions, i.e., the map function and reduce function. The map function is used to group all the data based on the key value and the reduce function is used to perform operations on the mapped data. So, the data is independently mapped and reduced in different spaces and then combined together in the function and the result will save to the specified new collection. This MapReduce() function generally operated on large data sets only. Using Map Reduce you can perform aggregation operations such as max, avg on the data using some key and it is similar to groupBy in SQL. It performs on data independently and parallels.

Here we are going to use the Aggression Framework of MongoDB and Create Mapper and Reducer Program

The dataset which I am using contains First_name, Last_name, people(gender), and Country.

Here we are going to count the number of people from each country.

So, Now let’s import the dataset in MongoDB.

mongoimport Your_data_set -d Databasename -c Collection --jsonArray

You can see above the database is successfully imported which has 1000 documents.

Let’s connect to the MongoDB shell:

Using the mongo command we can get into the mongo shell

mongo

Let’s see if the database is imported or not

> show dbs

Successfully imported the database.

> show collections

“countries” collection is successfully created.

Here we will perform this using two MongoDB Aggregation Framework.

  1. Aggregation Pipeline
  2. Map-Reduce Function

Let’s start with the first method — Aggregation Pipeline:

db.countries.aggregate([{$group: {_id: {country: “$country”}, no_of_people: {$sum: 1}}}, {$sort: {gender: 1}}])

{$group: {_id: {country: “$country”} → group by people(gender-male-female)

people: {$sum: 1} → count the total countries asscoiated with that people(male-female)

{$sort: {gender: 1} → sort them in ascending order

As we can see above that the aggregated the people as per country.

Map-Reduce Function — Method-2

MongoDB uses the MapReduce command for map-reduce operations. MapReduce is generally used for processing large data sets.

Declaring the Map variable:-

var mapFunc1 = function()  {
var cntry = emit(this.country, this.people);
$split: [ cntry, "," ];
};

Above we have declared a variable that will be grouping the data of people and country name. Then splitting the data by commas.

Declaring the Reduce variable:

var ReduceFunc1 = function(keycountry, valuespeople) {
return valuespeople.length;
};

We are counting the number of male+ female from the countries after the output has been sending by the mapper

Let’s use the MapReduce function:

db.countries.mapReduce( 
mapFunc1,
ReduceFunc1,
{out: "map_reduced"}
)

And saving the output of this function in map_reduced collection.

Let’s perform a query:

db.map_reduced.find().sort( { } )

The data is sorted and we have got what we wanted a number of people from each country.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store