Statistics on Gender Distribution⌗

We have a users collection, and we need to display the gender distribution of users in our BI. The data is as follows:

[
    {"name": "USER-01","gender": "male"},
    {"name": "USER-02","gender": "female"},
    {"name": "USER-03","gender": "male"},
    {"name": "USER-04","gender": "female"},
    {"name": "USER-05","gender": "male"},
    {"name": "USER-06","gender": "female"},
    {"name": "USER-07","gender": "female"},
    {"name": "USER-08","gender": "female"},
]

Implementation in Laravel:

use Jenssegers\Mongodb\Collection;

$data = User::query()->raw(function(Collection $collection) {
    return $collection->aggregate([
        [
            '$group' => [
                '_id' => '$gender',
                'count' => [
                    '$sum' => 1
                ]
            ]
        ],
        [
            '$project' => [
                '_id' => 0,
                'gender' => '$_id',
                'count' => 1
            ]
        ]
    ])->toArray();
});

return response()->json($data);

The result is as follows:

[
    {
        "gender": "male",
        "count": 3
    },
    {
        "gender": "female",
        "count": 5
    }
]

Statistics on Birth Year Distribution⌗

We have a users collection, and we need to display the birth year distribution of users in our BI. The data is as follows:

[
    {"name": "USER-01","birthday": ISODate("1998-04-24T00:00:00.000Z")},
    {"name": "USER-02","birthday": ISODate("1998-04-24T00:00:00.000Z")},
    {"name": "USER-03","birthday": ISODate("1997-04-24T00:00:00.000Z")},
    {"name": "USER-04","birthday": ISODate("2000-04-24T00:00:00.000Z")},
    {"name": "USER-05","birthday": ISODate("1991-04-24T00:00:00.000Z")},
    {"name": "USER-06","birthday": ISODate("1992-04-24T00:00:00.000Z")},
    {"name": "USER-07","birthday": ISODate("1992-04-24T00:00:00.000Z")},
    {"name": "USER-08","birthday": ISODate("1994-04-24T00:00:00.000Z")},
]

Implementation in Laravel:

use Jenssegers\Mongodb\Collection;

$data = User::query()->raw(function(Collection $collection) {
    return $collection->aggregate([
        [
            '$group' => [
                '_id' => [
                    '$dateToString' => [
                        'format' => '%Y',
                        'date' => '$birthday'
                    ]
                ],
                'count' => [
                    '$sum' => 1
                ]
            ]
        ],
        [
            '$project' => [
                '_id' => 0,
                'year' => '$_id',
                'count' => 1
            ]
        ]
    ])->toArray();
});

return response()->json($data);

The result is as follows:

[
    {
        "year": "1998",
        "count": 2
    },
    {
        "year": "1997",
        "count": 1
    },
    {
        "year": "2000",
        "count": 1
    },
    {
        "year": "1991",
        "count": 1
    },
    {
        "year": "1992",
        "count": 2
    },
    {
        "year": "1994",
        "count": 1
    }
]

Other Query Conditions⌗

If you need to add other query conditions, you can use $match in the Pipeline, as shown in the code below:

$result = User::query()->raw(function (Collection $collection) {
    return $collection->aggregate([
        [
            '$match' => [
                'status' => 'active',
                'created_at' => [
                    '$gte' => new UTCDateTime('2010-01-01'),
                    '$lte' => new UTCDateTime('2022-08-10')
                ]
            ]
        ],
        [
            '$group' => [
                '_id' => [
                    '$dateToString' => [
                        'format' => '%Y',
                        'date' => '$birthday'
                    ]
                ],
                'count' => [
                    '$sum' => 1
                ]
            ]
        ],
        [
            '$project' => [
                '_id' => 0,
                'year' => '$_id',
                'value' => '$count'
            ]
        ]
    ])->toArray();
});

The $match above indicates that only users with status active and registered between 2010-01-01 and 2022-08-10 will be queried.

Conclusion⌗

Although ORM is convenient and saves effort, it still cannot satisfy some complex queries. I don’t know if it’s because I’m getting older, but I’m increasingly fond of writing SQL or BSON queries by hand.

Learning MongoDB Aggregation Pipeline

Statistics on Gender Distribution⌗

Statistics on Birth Year Distribution⌗

Other Query Conditions⌗

Conclusion⌗