
Mongo locally vs on the cloud
We’ve been building GraphQL-based implementations for a couple of clients, and up until this month, the volume of usage in our Mongo database behind the scenes has been very light. Recently, our client wanted to shift-and-lift their architecture onto a Mongo Atlas serverless instance. Ok, no problem. That’s why we built on dockerized containers, orchestrated by Kubernetes. We had them switched over in a UAT environment in under a day with a few config map changes and some updated DevOps scripts.
However, upon further usage of the system, we noticed a dramatic slow-down in our pages. At first, we assumed this was a generic slow-down due to moving our database to “the cloud” (whereas previously Mongo was on the same vnet with the backend in Kubernetes for lightning ⚡️ speed).
Upon closer inspection, we saw a minor general slow-down, but one page in particular was unbearably slow. Some slowdown is expected. We have a new source of latency in the operational architecture by going over the Internet. But why was this one page so slow?
Graphs in GraphQL
The first thing we wanted to do was blame Mongo Atlas and switch the client back to local Kubernetes Mongo instances. But we needed data to make a good decision. The Node.js mongoose library has a debug mode that we setup, like so:
import mongoose from 'mongoose';
// ... initialize things here ...
if (process.env.NODE_ENV === 'development') {
mongoose.set('debug', { shell: true });
}
It didn’t take long to see that we had a problem in our backend. Simple queries were returning in reasonable amounts of time, but a page which listed up to 100 transactions in a data grid was the culprit. It was responsible for around 260 independent DB queries through the mongoose driver! Even just 20 ms of network latency adds up in that kind of a scenario. 20 ms * 260 = 5200 ms, which would make most users feel like this..

GraphQL crash course
At this point, it might be helpful to understand a little about how Graph QL works. It’s a fairly delightful technology for medium sized projects, because you get a large amount of control over the “shape” of your data returned from your backend API call. Here are some examples to whet your appetite: https://graphql.org/learn/queries/

It can really be characterized as a type of inversion of control pattern, which keeps your UI component’s data needs close to the business requirements of how to present that data. Here’s a sample from our app:
Frontend – GraphQL Call
// src/vuex/store/actions.js
import Vue from 'vue';
import Vuex from 'vuex';
import { clients } from '../../util/clients';
const { backend } = clients.direct; // Apollo GQL client
// ...
// suppose these variables are passed
const queryData = {
departments: [/* ... */ ],
wildcard: 'Bank of ABC',
sortBy: [{ type: "name", direction: 'asc' }],
};
// and suppose our UI needs a few properties passed back
const transactionGridColumns = `id, type, fiscalYear,
departmentDetail {
dept, description
},
hasFiles,
voucherAddress { address1, address2, city, state, zip },
amount, status, remarks`;
// so we call GraphQL for our data
const results = await backend.query({
query: gql`query searchTransactions($departments: [ID], $wildcard: String,
$skip: Int, $limit: Int, $sortBy: [SortBy]) {
odtransactions(departments: $departments, wildcard: $wildcard,
skip: $skip, limit: $limit, sortBy: $sortBy) {
${transactionGridColumns}
}
}`,
variables: queryData,
fetchPolicy: 'no-cache',
});
And that’s all great! Except for when it’s not.
You see, GraphQL can appear deceptively simple. As the client-side caller, you may not be aware when you’ve asked the API for data from another table/collection behind-the-scenes. Two properties in particular were giving us grief in the call above.. hasFiles, and departmentDetail.
Study this excerpt from the backend code for a moment..
Backend – GraphQL Server
export const ODTransactionType = new GraphQLObjectType({
name: 'ODTransaction',
fields: () => ({
..._ODTransactionTypeFields,
departmentDetail: {
type: ODDeptType,
// trouble - needs dataloader for batching / de-duping 🔥🔥🔥
resolve: ({ department }) => ODDepts.findById(department),
},
voucherAddress: { type: AddressType },
hasFiles: {
type: GraphQLBoolean,
resolve: async ({ _id }) => {
// trouble - needs dataloader for batching / de-duping 🔥🔥🔥
const { db } = mongo.connection;
const fileData = await db.collection('OD.files').find({
'metadata.program': 'OD',
'metadata.parentObjectType': 'ODTransaction',
'metadata.parentObjectId': _id.toString(),
}).toArray();
return fileData.length > 0;
},
},
}),
});
(note the 🔥🔥🔥 lines above)
The properties hasFiles and departmentDetail almost function like computed properties in Vue Js. They’re resolved within the context of the transaction that’s being returned. This explains the 200+ calls, because we had two different points where we needed info from a collection other than the transaction collections. Given 100 transactions, we would make 2 calls per transaction to fully resolve the GraphQL graph.
Enter GraphQL DataLoader
The DataLoader library was written for GraphQL, but would actually work in a variety of use cases similar to what’s described above, even in a RESTful environment. Here’s the project’s self-declared purpose:
DataLoader is a generic utility to be used as part of your application’s data fetching layer to provide a consistent API over various backends and reduce requests to those backends via batching and caching.
Yes! We want that. We need to de-duplicate, and cache, because many of these transactions may share the same departments. Suppose that there are only 7 unique departments in 100 transactions. That’s a huge performance opportunity.
Marc-André has a really nice visualization for DataLoader, which explains it well if you’re looking for further reading. But it essentially boils down to this:
Without DataLoader:

But with DataLoader, we can substitute those dependent promises to other data sources with a promise that resolves only after we’ve batched several together and returned them. Your calling code just calls a DataLoader function instead of a query function, and the rest of the experience is the same, and the calling code will know no differently.
With DataLoader:

It’s not difficult to imagine how, at scale, this would boost performance. After implementing a data loader for these two problem points, our page load time went from 3-6 seconds down to 400 ms. 🎉 Not the best it can be.. but totally worth some light refactoring! And we have a pattern for further, incremental improvement to the app.
The Fix
Here’s an example of what that looked like on the backend, in our pull request:
// odDeptLoader.js
import DataLoader from 'dataloader';
import { ODDept } from '../../mongo/models/OD/ODDept.js';
const odDeptLoaderFn = async (keys) => {
const odDepts = await ODDept.find({
_id: { $in: keys },
});
const odDeptsByKeys = odDepts.reduce((prev, curr) => ({
...prev,
[curr._id]: curr,
}), {});
return keys.map((k) => odDeptsByKeys[k]);
};
export const odDeptLoader = () => new DataLoader(odDeptLoaderFn);
export default {
odDeptLoader,
};
And the GraphQL refactor (just showing one for brevity)..
export const ODTransactionType = new GraphQLObjectType({
name: 'ODTransaction',
fields: () => ({
..._ODTransactionTypeFields,
departmentDetail: {
type: ODDeptType,
// one-liner change
resolve: ({ department }, _, { tmcLoaders }) => contextLoaders.OD.odDeptLoader.load(department),
},
}),
});
Easy peasy! So don’t hesitate to use DataLoader in your project. It can seem intimidating as a concept, but it’s actually quite delightful to implement.