Learning MongoDB Aggregation Pipeline

Statistics on Gender Distribution⌗
We have a users collection, and we need to display the gender distribution of users in our BI. The data is as follows:
[
{"name": "USER-01","gender": "male"},
{"name": "USER-02","gender": "female"},
{"name": "USER-03","gender": "male"},
{"name": "USER-04","gender": "female"},
{"name": "USER-05","gender": "male"},
{"name": "USER-06","gender": "female"},
{"name": "USER-07","gender": "female"},
{"name": "USER-08","gender": "female"},
]
Implementation in Laravel:
use Jenssegers\Mongodb\Collection;
$data = User::query()->raw(function(Collection $collection) {
return $collection->aggregate([
[
'$group' => [
'_id' => '$gender',
'count' => [
'$sum' => 1
]
]
],
[
'$project' => [
'_id' => 0,
'gender' => '$_id',
'count' => 1
]
]
])->toArray();
});
return response()->json($data);
The result is as follows:
[
{
"gender": "male",
"count": 3
},
{
"gender": "female",
"count": 5
}
]
Statistics on Birth Year Distribution⌗
We have a users collection, and we need to display the birth year distribution of users in our BI. The data is as follows:
[
{"name": "USER-01","birthday": ISODate("1998-04-24T00:00:00.000Z")},
{"name": "USER-02","birthday": ISODate("1998-04-24T00:00:00.000Z")},
{"name": "USER-03","birthday": ISODate("1997-04-24T00:00:00.000Z")},
{"name": "USER-04","birthday": ISODate("2000-04-24T00:00:00.000Z")},
{"name": "USER-05","birthday": ISODate("1991-04-24T00:00:00.000Z")},
{"name": "USER-06","birthday": ISODate("1992-04-24T00:00:00.000Z")},
{"name": "USER-07","birthday": ISODate("1992-04-24T00:00:00.000Z")},
{"name": "USER-08","birthday": ISODate("1994-04-24T00:00:00.000Z")},
]
Implementation in Laravel:
use Jenssegers\Mongodb\Collection;
$data = User::query()->raw(function(Collection $collection) {
return $collection->aggregate([
[
'$group' => [
'_id' => [
'$dateToString' => [
'format' => '%Y',
'date' => '$birthday'
]
],
'count' => [
'$sum' => 1
]
]
],
[
'$project' => [
'_id' => 0,
'year' => '$_id',
'count' => 1
]
]
])->toArray();
});
return response()->json($data);
The result is as follows:
[
{
"year": "1998",
"count": 2
},
{
"year": "1997",
"count": 1
},
{
"year": "2000",
"count": 1
},
{
"year": "1991",
"count": 1
},
{
"year": "1992",
"count": 2
},
{
"year": "1994",
"count": 1
}
]
Other Query Conditions⌗
If you need to add other query conditions, you can use $match
in the Pipeline, as shown in the code below:
$result = User::query()->raw(function (Collection $collection) {
return $collection->aggregate([
[
'$match' => [
'status' => 'active',
'created_at' => [
'$gte' => new UTCDateTime('2010-01-01'),
'$lte' => new UTCDateTime('2022-08-10')
]
]
],
[
'$group' => [
'_id' => [
'$dateToString' => [
'format' => '%Y',
'date' => '$birthday'
]
],
'count' => [
'$sum' => 1
]
]
],
[
'$project' => [
'_id' => 0,
'year' => '$_id',
'value' => '$count'
]
]
])->toArray();
});
The $match
above indicates that only users with status active
and registered between 2010-01-01
and 2022-08-10
will be queried.
Conclusion⌗
Although ORM is convenient and saves effort, it still cannot satisfy some complex queries. I don’t know if it’s because I’m getting older, but I’m increasingly fond of writing SQL or BSON queries by hand.