Friday, October 10, 2014

Setting Up a MongoDB Cluster

I recently setup a MongoDB cluster on my workstation.  While the Mongo documentation is very good, the proper setup is scattered between the Replication and Sharding documentation.  I figured this might be helpful to others working with Mongo for the first time.  Obviously this is not a valid production setup!  For this, you'd need to place the replication sets, shard servers and config servers on separate machines.

The commands rs.help() and sh.help() come in handy along the way.  Additional useful commands include rs.status(), rs.printReplicationInfo(), rs.printSlaveReplicationInfo(), sh.status(), and db.{$collection-name}.getShardDistribution().

Config Server Setup for 3 Nodes
1. Create a data directory for each of the three config server nodes.
mkdir c:\data\cluster\config\node1
mkdir c:\data\cluster\config\node2
mkdir c:\data\cluster\config\node3

2. Start the 3 config server nodes.
mongod --configsvr --dbpath c:\data\cluster\config\node1 --bind_ip ${machine-name} --port 27020
mongod --configsvr --dbpath c:\data\cluster\config\node2 --bind_ip ${machine-name} --port 27021
mongod --configsvr --dbpath c:\data\cluster\config\node3 --bind_ip ${machine-name} --port 27022

Start 2 MongoDB Shard Servers (mongos) Used By Client Apps
1. Start the 2 mongos instances.
mongos --configdb ${machine-name}:27020,${machine-name}:27021,${machine-name}:27022 --port 27017
mongos --configdb ${machine-name}:27020,${machine-name}:27021,${machine-name}:27022 --port 27018

Replication Set (RS) and Shard Setup for Shard 1
1. Create the data directory for each node.
mkdir c:\data\cluster\node1
mkdir c:\data\cluster\node2
mkdir c:\data\cluster\node3

2. Start the 3 nodes.
mongod --dbpath c:\data\cluster\node1 --bind_ip ${machine-name} --port 27000 --replSet "rs1"
mongod --dbpath c:\data\cluster\node2 --bind_ip ${machine-name} --port 27001 --replSet "rs1"
mongod --dbpath c:\data\cluster\node3 --bind_ip ${machine-name} --port 27002 --replSet "rs1"

3. Connect to one of the nodes.
mongo --host ${machine-name} --port 27000

4. Create the replication set.
rs.initiate()

5. Add the second node to the replication set.
rs.add("${machine-name}:27001")

6. Add the third node to the replication set.
rs.add("${machine-name}:27002")

7. Validate your replication setup.  You should see JSON output with a single PRIMARY and two SECONDARY records.
rs1:PRIMARY> rs.status()
{
        "set" : "rs1",
        "date" : ISODate("2014-10-10T20:44:52Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "l7a973:27000",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 1989,
                        "optime" : Timestamp(1412972013, 1),
                        "optimeDate" : ISODate("2014-10-10T20:13:33Z"),
                        "electionTime" : Timestamp(1412971998, 2),
                        "electionDate" : ISODate("2014-10-10T20:13:18Z"),
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "l7a973:27001",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 1882,
                        "optime" : Timestamp(1412972013, 1),
                        "optimeDate" : ISODate("2014-10-10T20:13:33Z"),
                        "lastHeartbeat" : ISODate("2014-10-10T20:44:50Z"),
                        "lastHeartbeatRecv" : ISODate("2014-10-10T20:44:51Z"),
                        "pingMs" : 0,
                        "syncingTo" : "l7a973:27000"
                },
                {
                        "_id" : 2,
                        "name" : "l7a973:27002",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 1879,
                        "optime" : Timestamp(1412972013, 1),
                        "optimeDate" : ISODate("2014-10-10T20:13:33Z"),
                        "lastHeartbeat" : ISODate("2014-10-10T20:44:51Z"),
                        "lastHeartbeatRecv" : ISODate("2014-10-10T20:44:50Z"),
                        "pingMs" : 0,
                        "syncingTo" : "l7a973:27000"
                }
        ],
        "ok" : 1
}

8.  Connect to one of the shard servers (mongos).
mongo --host ${machine-name} --port 27017

9. Add the replication set to the shard.
sh.addShard("rs1/${machine-name}:27000")

10. Enable sharding for the database (I'm using "ngi" for my database name).
sh.enableSharding("${database-name}")

Replication Set (RS) and Shard Setup for Shard 2 
1. Create the data directory for each node.
mkdir c:\data\cluster\node4
mkdir c:\data\cluster\node5
mkdir c:\data\cluster\node6

2. Start the 3 nodes.
mongod --dbpath c:\data\cluster\node4 --bind_ip ${machine-name} --port 27010 --replSet "rs2"
mongod --dbpath c:\data\cluster\node5 --bind_ip ${machine-name} --port 27011 --replSet "rs2"
mongod --dbpath c:\data\cluster\node6 --bind_ip ${machine-name} --port 27012 --replSet "rs2"

3. Connect to one of the nodes.
mongo --host ${machine-name} --port 27010

4. Create the replication set.
rs.initiate()

5. Add the second node to the replication set.
rs.add("${machine-name}:27011")

6. Add the third node to the replication set.
rs.add("${machine-name}:27012")

7. Validate your replication setup.  You should see JSON output with a single PRIMARY and two SECONDARY records.
rs2:PRIMARY> rs.status()
{
        "set" : "rs2",
        "date" : ISODate("2014-10-10T20:46:15Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "l7a973:27010",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 1642,
                        "optime" : Timestamp(1412973972, 1),
                        "optimeDate" : ISODate("2014-10-10T20:46:12Z"),
                        "electionTime" : Timestamp(1412972423, 2),
                        "electionDate" : ISODate("2014-10-10T20:20:23Z"),
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "l7a973:27011",
                        "health" : 1,
                        "state" : 5,
                        "stateStr" : "STARTUP2",
                        "uptime" : 7,
                        "optime" : Timestamp(0, 0),
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2014-10-10T20:46:14Z"),
                        "lastHeartbeatRecv" : ISODate("2014-10-10T20:46:13Z"),
                        "pingMs" : 0
                },
                {
                        "_id" : 2,
                        "name" : "l7a973:27012",
                        "health" : 1,
                        "state" : 5,
                        "stateStr" : "STARTUP2",
                        "uptime" : 3,
                        "optime" : Timestamp(0, 0),
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2014-10-10T20:46:14Z"),
                        "lastHeartbeatRecv" : ISODate("2014-10-10T20:46:14Z"),
                        "pingMs" : 0
                }
        ],
        "ok" : 1
}

8.  Connect to one of the shard servers (mongos)
mongo --host ${machine-name} --port 27017

9. Add the replication set to the shard.
mongos>sh.addShard("rs2/${machine-name}:27010")

10. Validate your setup.  You should see JSON output with a single PRIMARY and two SECONDARY records.
mongos> sh.status()
--- Sharding Status ---
  sharding version: {
        "_id" : 1,
        "version" : 4,
        "minCompatibleVersion" : 4,
        "currentVersion" : 5,
        "clusterId" : ObjectId("54383c1cc3b7bde946a478cf")
}
  shards:
        {  "_id" : "rs1",  "host" : "rs1/l7a973:27000,l7a973:27001,l7a973:27002" }
        {  "_id" : "rs2",  "host" : "rs2/l7a973:27010,l7a973:27011,l7a973:27012" }
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "ngi",  "partitioned" : true,  "primary" : "rs1" }

Add Some Test Data and Validate the Distribution
1.  Connect to one of the shard servers (mongos)
mongo --host ${machine-name} --port 27017

2. Switch to your database.
mongos>use ${database-name}

3. Add 100,000 test data records.
mongos> for(var i=1; i<=100000; i++) { db.mycollection.insert({x:i}) }

4. Shard the collection based on the id.  Remember that "ngi" is my database name.
mongos> sh.shardCollection("ngi.mycollection", {_id: 1})
{ "collectionsharded" : "ngi.mycollection", "ok" : 1 }

5. Validate that the collection is sharded
mongos> db.mycollection.getShardDistribution()

Shard rs1 at rs1/l7a973:27000,l7a973:27001,l7a973:27002
 data : 602KiB docs : 12860 chunks : 1
 estimated data per chunk : 602KiB
 estimated docs per chunk : 12860

Shard rs2 at rs2/l7a973:27010,l7a973:27011,l7a973:27012
 data : 3.98MiB docs : 87140 chunks : 2
 estimated data per chunk : 1.99MiB
 estimated docs per chunk : 43570

Totals
 data : 4.57MiB docs : 100000 chunks : 3
 Shard rs1 contains 12.86% data, 12.86% docs in cluster, avg obj size on shard : 48B
 Shard rs2 contains 87.13% data, 87.14% docs in cluster, avg obj size on shard : 48B