Clicky

Some notes from Mongo

This section was labeled under, or is related to Databases and Programming

Basics

  • MongoDB is optimized for scaling out, using its document-oriented model to distribute data efficiently across multiple servers. MongoDB’s document-oriented model is well suited to distributing data because each document is self-contained, meaning that all necessary information is stored in one place, reducing the need for the complex joins between data that are common in relational databases. As a result, MongoDB can efficiently manage and allocate data across servers, facilitating easy scaling. Because documents don’t rely on others to function or fulfill queries, the system can quickly retrieve data from the appropriate server, enhancing performance and scalability.

    2025-11-08_23-03-13_screenshot.png
  • mongos routers are responsible for distributing client queries to the correct shards by using metadata from config servers. They support high availability and scalability through the deployment of multiple mongos
  • Keep it under 30 Why? Each mongos constantly talks to the config servers, and too many conversations slow everything down (like too many people asking the librarian questions at once)
  • Aggregation framework: Aggregation operations process data records and return computed results. Aggregation operations group values from multiple documents together and can perform a variety of operations on the grouped data to return a single result. The framework provides features similar to SQL’s GROUP BY, related operators, and basic self-joins; it also allows data reshaping.
  • Change streams: Applications can use change streams to subscribe to all data changes on a single collection, a database, or an entire deployment and react to them immediately. Because change streams use the aggregation framework, applications can also filter for specific changes or transform the notifications at will
  • Special collections and indexes: MongoDB features time-to-live (TTL) indexes for automatically expiring data, capped collections for maintaining recent data like logs, and partial indexes to index only documents that meet specific criteria, enhancing efficiency and saving storage space

    A TTL index automatically deletes documents after a specified time. You create it on a field containing a date (for example, createdAt), and MongoDB removes documents once that time expires. Use case: session data, cache entries, temporary logs, or anything that shouldn’t live forever. Equivalent concept: scheduled deletion or “auto-expiry” in SQL.

  • Atlas Search—Built on industry-leading Apache Lucene. Atlas Search is an embedded full-text search in MongoDB, including abilities like custom scoring and facets to provide fast, relevant searches for users
  • Capped Collections A capped collection is a fixed-size collection that behaves like a circular buffer. When it reaches its maximum size, the oldest documents are automatically overwritten by new ones.
    • You define a maximum size (in bytes) when creating it.
    • Documents are stored in insertion order and cannot be deleted (except by overwriting).
    • Ideal for logging, caching, or telemetry data where you only care about recent entries.
    • Fast inserts and reads because of the fixed size and no document relocation.

      db.createCollection("logs", { capped: true, size: 1048576 }) // 1MB capped collection
      
  • Time-Series Collections A time-series collection is optimized for data that changes over time — such as metrics, IoT sensor data, or application monitoring.
    • Each document represents a measurement with a timestamp and metadata (like device ID).
    • MongoDB automatically organizes data internally into “buckets” for efficient storage and queries.
    • Supports automatic data compression, TTL expiration, and fast range queries on time fields.
    • Great for analytics and monitoring use cases. Example:

      db.createCollection("sensorData", {
        timeseries: {
          timeField: "timestamp",
          metaField: "sensorId",
          granularity: "minutes"
        }
      })
      
  • Capped collections → fixed-size, high-speed, overwrite old data (like a log buffer).
  • Time-series collections → structured for chronological data, optimized for queries over time ranges.

Views and Materialized Views

Views A view in MongoDB is a read-only, virtual collection defined by an aggregation pipeline. It doesn’t store data itself — every time you query it, MongoDB executes the underlying aggregation on the base collection.

Key points:

  • Always up to date (since it runs live on source data).
  • No additional storage cost.
  • Can be indexed only through the underlying collection, not directly.
  • Slower for large datasets because results are computed on each access.

Example:

db.createView("activeUsersView", "users", [
  { $match: { active: true } },
  { $project: { name: 1, email: 1 } }
])

Materialized Views A materialized view is not a built-in MongoDB feature (you have to implement it manually). It’s a precomputed and stored version of a query result — like a regular collection that you periodically refresh.

Key points:

  • Stores actual data (uses disk space).
  • Faster query performance since data is precomputed.
  • Must be refreshed manually or via a scheduled process (not automatically updated).
  • Common in analytics or dashboard workloads.

Example (manual implementation):

const data = db.orders.aggregate([
  { $group: { _id: "$status", total: { $sum: 1 } } }
])
db.materializedOrders.drop()
db.materializedOrders.insertMany(data.toArray())

Interview tip:

  • View: dynamic, no storage, slower, always current.
  • Materialized view: stored, faster, needs refresh, uses space.

Design

In database management, MongoDB stands out due to its flexible schema nature, offering a flexible, dynamic approach to data organization. Unlike traditional relational databases, which require a predefined schema to structure data, MongoDB allows documents within a collection to have different fields and data types. This flexible schema not only handles evolving requirements smoothly but also easily accommodates structured, unstructured, and semi-structured data, as each document inherently carries its own schema.


Some works I recommend engaging with:

I seek refuge in God, from Satan the rejected. Generated by: Emacs 30.2 (Org mode 9.7.34). Written by: Salih Muhammed, by the date of: 2025-11-08 Sat 23:00. Last build date: 2025-11-09 Sun 03:43.