Friday, October 10, 2014

Setting Up a MongoDB Cluster

I recently setup a MongoDB cluster on my workstation.  While the Mongo documentation is very good, the proper setup is scattered between the Replication and Sharding documentation.  I figured this might be helpful to others working with Mongo for the first time.  Obviously this is not a valid production setup!  For this, you'd need to place the replication sets, shard servers and config servers on separate machines.

The commands rs.help() and sh.help() come in handy along the way.  Additional useful commands include rs.status(), rs.printReplicationInfo(), rs.printSlaveReplicationInfo(), sh.status(), and db.{$collection-name}.getShardDistribution().

Config Server Setup for 3 Nodes
1. Create a data directory for each of the three config server nodes.
mkdir c:\data\cluster\config\node1
mkdir c:\data\cluster\config\node2
mkdir c:\data\cluster\config\node3

2. Start the 3 config server nodes.
mongod --configsvr --dbpath c:\data\cluster\config\node1 --bind_ip ${machine-name} --port 27020
mongod --configsvr --dbpath c:\data\cluster\config\node2 --bind_ip ${machine-name} --port 27021
mongod --configsvr --dbpath c:\data\cluster\config\node3 --bind_ip ${machine-name} --port 27022

Start 2 MongoDB Shard Servers (mongos) Used By Client Apps
1. Start the 2 mongos instances.
mongos --configdb ${machine-name}:27020,${machine-name}:27021,${machine-name}:27022 --port 27017
mongos --configdb ${machine-name}:27020,${machine-name}:27021,${machine-name}:27022 --port 27018

Replication Set (RS) and Shard Setup for Shard 1
1. Create the data directory for each node.
mkdir c:\data\cluster\node1
mkdir c:\data\cluster\node2
mkdir c:\data\cluster\node3

2. Start the 3 nodes.
mongod --dbpath c:\data\cluster\node1 --bind_ip ${machine-name} --port 27000 --replSet "rs1"
mongod --dbpath c:\data\cluster\node2 --bind_ip ${machine-name} --port 27001 --replSet "rs1"
mongod --dbpath c:\data\cluster\node3 --bind_ip ${machine-name} --port 27002 --replSet "rs1"

3. Connect to one of the nodes.
mongo --host ${machine-name} --port 27000

4. Create the replication set.
rs.initiate()

5. Add the second node to the replication set.
rs.add("${machine-name}:27001")

6. Add the third node to the replication set.
rs.add("${machine-name}:27002")

7. Validate your replication setup.  You should see JSON output with a single PRIMARY and two SECONDARY records.
rs1:PRIMARY> rs.status()
{
        "set" : "rs1",
        "date" : ISODate("2014-10-10T20:44:52Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "l7a973:27000",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 1989,
                        "optime" : Timestamp(1412972013, 1),
                        "optimeDate" : ISODate("2014-10-10T20:13:33Z"),
                        "electionTime" : Timestamp(1412971998, 2),
                        "electionDate" : ISODate("2014-10-10T20:13:18Z"),
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "l7a973:27001",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 1882,
                        "optime" : Timestamp(1412972013, 1),
                        "optimeDate" : ISODate("2014-10-10T20:13:33Z"),
                        "lastHeartbeat" : ISODate("2014-10-10T20:44:50Z"),
                        "lastHeartbeatRecv" : ISODate("2014-10-10T20:44:51Z"),
                        "pingMs" : 0,
                        "syncingTo" : "l7a973:27000"
                },
                {
                        "_id" : 2,
                        "name" : "l7a973:27002",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 1879,
                        "optime" : Timestamp(1412972013, 1),
                        "optimeDate" : ISODate("2014-10-10T20:13:33Z"),
                        "lastHeartbeat" : ISODate("2014-10-10T20:44:51Z"),
                        "lastHeartbeatRecv" : ISODate("2014-10-10T20:44:50Z"),
                        "pingMs" : 0,
                        "syncingTo" : "l7a973:27000"
                }
        ],
        "ok" : 1
}

8.  Connect to one of the shard servers (mongos).
mongo --host ${machine-name} --port 27017

9. Add the replication set to the shard.
sh.addShard("rs1/${machine-name}:27000")

10. Enable sharding for the database (I'm using "ngi" for my database name).
sh.enableSharding("${database-name}")

Replication Set (RS) and Shard Setup for Shard 2 
1. Create the data directory for each node.
mkdir c:\data\cluster\node4
mkdir c:\data\cluster\node5
mkdir c:\data\cluster\node6

2. Start the 3 nodes.
mongod --dbpath c:\data\cluster\node4 --bind_ip ${machine-name} --port 27010 --replSet "rs2"
mongod --dbpath c:\data\cluster\node5 --bind_ip ${machine-name} --port 27011 --replSet "rs2"
mongod --dbpath c:\data\cluster\node6 --bind_ip ${machine-name} --port 27012 --replSet "rs2"

3. Connect to one of the nodes.
mongo --host ${machine-name} --port 27010

4. Create the replication set.
rs.initiate()

5. Add the second node to the replication set.
rs.add("${machine-name}:27011")

6. Add the third node to the replication set.
rs.add("${machine-name}:27012")

7. Validate your replication setup.  You should see JSON output with a single PRIMARY and two SECONDARY records.
rs2:PRIMARY> rs.status()
{
        "set" : "rs2",
        "date" : ISODate("2014-10-10T20:46:15Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "l7a973:27010",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 1642,
                        "optime" : Timestamp(1412973972, 1),
                        "optimeDate" : ISODate("2014-10-10T20:46:12Z"),
                        "electionTime" : Timestamp(1412972423, 2),
                        "electionDate" : ISODate("2014-10-10T20:20:23Z"),
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "l7a973:27011",
                        "health" : 1,
                        "state" : 5,
                        "stateStr" : "STARTUP2",
                        "uptime" : 7,
                        "optime" : Timestamp(0, 0),
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2014-10-10T20:46:14Z"),
                        "lastHeartbeatRecv" : ISODate("2014-10-10T20:46:13Z"),
                        "pingMs" : 0
                },
                {
                        "_id" : 2,
                        "name" : "l7a973:27012",
                        "health" : 1,
                        "state" : 5,
                        "stateStr" : "STARTUP2",
                        "uptime" : 3,
                        "optime" : Timestamp(0, 0),
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2014-10-10T20:46:14Z"),
                        "lastHeartbeatRecv" : ISODate("2014-10-10T20:46:14Z"),
                        "pingMs" : 0
                }
        ],
        "ok" : 1
}

8.  Connect to one of the shard servers (mongos)
mongo --host ${machine-name} --port 27017

9. Add the replication set to the shard.
mongos>sh.addShard("rs2/${machine-name}:27010")

10. Validate your setup.  You should see JSON output with a single PRIMARY and two SECONDARY records.
mongos> sh.status()
--- Sharding Status ---
  sharding version: {
        "_id" : 1,
        "version" : 4,
        "minCompatibleVersion" : 4,
        "currentVersion" : 5,
        "clusterId" : ObjectId("54383c1cc3b7bde946a478cf")
}
  shards:
        {  "_id" : "rs1",  "host" : "rs1/l7a973:27000,l7a973:27001,l7a973:27002" }
        {  "_id" : "rs2",  "host" : "rs2/l7a973:27010,l7a973:27011,l7a973:27012" }
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "ngi",  "partitioned" : true,  "primary" : "rs1" }

Add Some Test Data and Validate the Distribution
1.  Connect to one of the shard servers (mongos)
mongo --host ${machine-name} --port 27017

2. Switch to your database.
mongos>use ${database-name}

3. Add 100,000 test data records.
mongos> for(var i=1; i<=100000; i++) { db.mycollection.insert({x:i}) }

4. Shard the collection based on the id.  Remember that "ngi" is my database name.
mongos> sh.shardCollection("ngi.mycollection", {_id: 1})
{ "collectionsharded" : "ngi.mycollection", "ok" : 1 }

5. Validate that the collection is sharded
mongos> db.mycollection.getShardDistribution()

Shard rs1 at rs1/l7a973:27000,l7a973:27001,l7a973:27002
 data : 602KiB docs : 12860 chunks : 1
 estimated data per chunk : 602KiB
 estimated docs per chunk : 12860

Shard rs2 at rs2/l7a973:27010,l7a973:27011,l7a973:27012
 data : 3.98MiB docs : 87140 chunks : 2
 estimated data per chunk : 1.99MiB
 estimated docs per chunk : 43570

Totals
 data : 4.57MiB docs : 100000 chunks : 3
 Shard rs1 contains 12.86% data, 12.86% docs in cluster, avg obj size on shard : 48B
 Shard rs2 contains 87.13% data, 87.14% docs in cluster, avg obj size on shard : 48B

Tuesday, October 30, 2012

The Future of Analytics and Business Intelligence

Been excitedly waiting to see what comes from Numenta since ~2006. Can't wait to see what's next. If I could go work for them tomorrow I would. Can't wait to see what Hawkins is talking about 20 years from now.

Clojure!

Spent some time looking at Clojure today. Definitely mind-bending given I've never worked with a Lisp dialect. Downloaded and walked trough the examples on the site. I will definitely be taking a deeper dive if nothing more than to help diversify my perspective. I'm interested in learning more including ClojureScript.

Wednesday, July 18, 2012

Habitat for Humanity

I spent the day yesterday with 15 co-workers helping to build a Habitat for Humanity house. It was fun, felt good and was less stressful than work. We spent the day putting siding on the house. I enjoyed challenging my fear of heights standing on scaffolding at the peak of the roof while using a nail gun.

Tuesday, December 29, 2009

HTML Parsing With Groovy and TagSoup

I'm working on an app where I need to parse some HTML. This is the first time I've had to do screen-scraping with Groovy. After a bit of trial and error I think I'm getting the hang of it. The HTML I'm working with isn't well-formed, so the default Groovy XmlSlurper and XmlParser puke. After some digging I found TagSoup. It "parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short".

It made my parsing much easier. Thanks John Cowan!