Manoj Simar - The Technology Expert: Interview Q and A for MongoDB Part

131. Can I use index keys to constrain query matches?

You can use the min() and max() methods to constrain the results of the cursor returned from find() by using index keys.

132. Using $ne and $nin in a query is slow. Why?

The $ne and $nin operators are not selective. If you need to use these, it is often best to make sure that an additional, more selective criterion is part of the query.

133. Can I use a multi-key index to support a query for a whole array?

Not entirely. The index can partially support these queries because it can speed the selection of the first element of the array; however, comparing all subsequent items in the array cannot use the index and must scan the documents individually.

134. How can I effectively use indexes strategy for attribute lookups?

For simple attribute lookups that don’t require sorted result sets or range queries, consider creating a field that contains an array of documents where each document has a field (e.g. attrib ) that holds a specific type of attribute. You can index this attrib field.

For example, the attrib field in the following document allows you to add an unlimited number of attributes types:

{ _id : ObjectId(...),

attrib : [

{ k: "color", v: "red" },

{ k: "shape": v: "rectangle" },

{ k: "color": v: "blue" },

{ k: "avail": v: true }

]

}

Both of the following queries could use the same { "attrib.k": 1, "attrib.v": 1 } index:

db.mycollection.find( { attrib: { $elemMatch : { k: "color", v: "blue" } } } )

db.mycollection.find( { attrib: { $elemMatch : { k: "avail", v: true } } } )

135. Where can I find information about a mongod process that stopped running unexpectedly?

If mongod shuts down unexpectedly on a UNIX or UNIX-based platform, and if mongod fails to log a shutdown or error message, then check your system logs for messages pertaining to MongoDB. For example, for logs located in /var/log/messages, use the following commands:

sudo grep mongod /var/log/messages

sudo grep score /var/log/messages

136. Does TCP keepalive time affect sharded clusters and replica sets?

If you experience socket errors between members of a sharded cluster or replica set, that do not have other reasonable causes, check the TCP keep alive value, which Linux systems store as the tcp_keepalive_time value. A common keep alive period is 7200 seconds (2 hours); however, different distributions and OS X may have different settings. For MongoDB, you will have better experiences with shorter keepalive periods, on the order of 300 seconds (five minutes).

On Linux systems you can use the following operation to check the value of tcp_keepalive_time:

cat /proc/sys/net/ipv4/tcp_keepalive_time

You can change the tcp_keepalive_time value with the following operation:

echo 300 > /proc/sys/net/ipv4/tcp_keepalive_time

The new tcp_keepalive_time value takes effect without requiring you to restart the mongod or mongos servers. When you reboot or restart your system you will need to set the new tcp_keepalive_time value, or see your operating system’s documentation for setting the TCP keepalive value persistently.

For OS X systems, issue the following command to view the keep alive setting:

sysctl net.inet.tcp.keepinit

To set a shorter keep alive period use the following invocation:

sysctl -w net.inet.tcp.keepinit=300

If your replica set or sharded cluster experiences keepalive-related issues, you must alter the tcp_keepalive_time value on all machines hosting MongoDB processes. This includes all machines hosting mongos or mongod servers.

Windows users should consider the Windows Server Technet Article on KeepAliveTime configuration30 for more

information on setting keep alive for MongoDB deployments on Windows systems.

137. What tools are available for monitoring MongoDB?

The MongoDB Management Services <http://mms.mongodb.com> includes monitoring. MMS Monitoring is a free, hosted services for monitoring MongoDB deployments. A full list of third-party tools is available as part of the

Monitoring for MongoDB documentation.

138. Do I need to configure swap space?

Always configure systems to have swap space. Without swap, your system may not be reliant in some situations with extreme memory constraints, memory leaks, or multiple programs using the same memory. Think of the swap space as something like a steam release valve that allows the system to release extra pressure without affecting the overall functioning of the system.

Nevertheless, systems running MongoDB do not need swap for routine operation. Database files are memory-mapped and should constitute most of your MongoDB memory use. Therefore, it is unlikely that mongod will ever use any swap space in normal operation. The operating system will release memory from the memory mapped files without needing swap and MongoDB can write data to the data files without needing the swap system.

139. What is “working set” and how can I estimate its size?

The working set for a MongoDB database is the portion of your data that clients access most often. You can estimate size of the working set, using the workingSet document in the output of serverStatus. To return serverStatus with the workingSet document, issue a command in the following form:

db.runCommand( { serverStatus: 1, workingSet: 1 } )

140. Must my working set size fit RAM?

Your working set should stay in memory to achieve good performance. Otherwise many random disk IO’s will occur, and unless you are using SSD, this can be quite slow.

One area to watch specifically in managing the size of your working set is index access patterns. If you are inserting into indexes at random locations (as would happen with id’s that are randomly generated by hashes), you will continually

be updating the whole index. If instead you are able to create your id’s in approximately ascending order (for example, day concatenated with a random id), all the updates will occur at the right side of the b-tree and the working set size for index pages will be much smaller.

It is fine if databases and thus virtual size are much larger than RAM.

141. How do I calculate how much RAM I need for my application?

The amount of RAM you need depends on several factors, including but not limited to:

• The relationship between database storage and working set.

• The operating system’s cache strategy for LRU (Least Recently Used)

• The impact of journaling

• The number or rate of page faults and other MMS gauges to detect when you need more RAM

• Each database connection thread will need up to 1 MB of RAM.

MongoDB defers to the operating system when loading data into memory from disk. It simply memory maps all its data files and relies on the operating system to cache data. The OS typically evicts the leastrecently- used data from RAM when it runs low on memory. For example if clients access indexes more frequently than documents, then indexes will more likely stay in RAM, but it depends on your particular usage.

To calculate how much RAM you need, you must calculate your working set size, or the portion of your data that clients use most often. This depends on your access patterns, what indexes you have, and the size of your documents.

Because MongoDB uses a thread per connection model, each database connection also will need up to 1MB of RAM, whether active or idle.

If page faults are infrequent, your working set fits in RAM. If fault rates rise higher than that, you risk performance degradation. This is less critical with SSD drives than with spinning disks.

142. How do I read memory statistics in the UNIX top command

Because mongod uses memory-mapped files, the memory statistics in top require interpretation in a special way. On a large database, VSIZE (virtual bytes) tends to be the size of the entire database. If the mongod doesn’t have other processes running, RSIZE (resident bytes) is the total memory of the machine, as this counts file system cache contents.

For Linux systems, use the vmstat command to help determine how the system uses memory. On OS X systems use vm_stat.

143. What are the factors to successful shared cluster.

The two most important factors in maintaining a successful sharded cluster are:

• choosing an appropriate shard key and

• sufficient capacity to support current and future operations.

You can prevent most issues encountered with sharding by ensuring that you choose the best possible shard key for your deployment and ensure that you are always adding additional capacity to your cluster well before the current

resources become saturated.

144. In a new sharded cluster, why does all data remains on one shard?

Your cluster must have sufficient data for sharding to make sense. Sharding works by migrating chunks between the shards until each shard has roughly the same number of chunks.

The default chunk size is 64 megabytes. MongoDB will not begin migrations until the imbalance of chunks in the cluster exceeds the migration threshold. While the default chunk size is configurable with the chunkSize setting, these behaviors help prevent unnecessary chunk migrations, which can degrade the performance of your cluster as a whole.

If you have just deployed a sharded cluster, make sure that you have enough data to make sharding effective. If you do not have sufficient data to create more than eight 64 megabyte chunks, then all data will remain on one shard. Either lower the chunk size setting, or add more data to the cluster.

As a related problem, the system will split chunks only on inserts or updates, which means that if you configure sharding and do not continue to issue insert and update operations, the database will not create any chunks. You can either wait until your application inserts data or split chunks manually.

Finally, if your shard key has a low cardinality, MongoDB may not be able to create sufficient splits among the data.

145. Why would one shard receive a disproportion amount of traffic in a sharded cluster?

In some situations, a single shard or a subset of the cluster will receive a disproportionate portion of the traffic and workload. In almost all cases this is the result of a shard key that does not effectively allow write scaling.

It’s also possible that you have “hot chunks.” In this case, you may be able to solve the problem by splitting and then migrating parts of these chunks.

In the worst case, you may have to consider re-sharding your data and choosing a different shard key to correct this pattern.

146. What can prevent a sharded cluster from balancing?

If you have just deployed your sharded cluster, you may want to consider the troubleshooting suggestions for a new cluster where data remains on a single shard.

If the cluster was initially balanced, but later developed an uneven distribution of data, consider the following possible causes:

• You have deleted or removed a significant amount of data from the cluster. If you have added additional data, it may have a different distribution with regards to its shard key.

• Your shard key has low cardinality and MongoDB cannot split the chunks any further.

Your data set is growing faster than the balancer can distribute data around the cluster. This is uncommon and typically is the result of:

– a balancing window that is too short, given the rate of data growth.

– an uneven distribution of write operations that requires more data migration. You may have to choose a different shard key to resolve this issue.

– poor network connectivity between shards, which may lead to chunk migrations that take too long to complete. Investigate your network configuration and interconnections between shards.

147. Why do chunk migrations affect sharded cluster performance?

If migrations impact your cluster or application’s performance, consider the following options, depending on the nature of the impact:

1. If migrations only interrupt your clusters sporadically, you can limit the balancing window to prevent balancing activity during peak hours. Ensure that there is enough time remaining to keep the data from becoming out of balance again.

2. If the balancer is always migrating chunks to the detriment of overall cluster performance:

• You may want to attempt decreasing the chunk size to limit the size of the migration.

• Your cluster may be over capacity, and you may want to attempt to add one or two shards to the cluster to distribute load.

It’s also possible that your shard key causes your application to direct all writes to a single shard. This kind of activity pattern can require the balancer to migrate most data soon after writing it. Consider redeploying your cluster with a shard key that provides better write scaling.

148. What is default location of Mongo DB for data files and log files?

The MongoDB instance stores its data files in /var/lib/mongo and its log files in /var/log/mongodb by default, and runs using the mongod user account. You can specify alternate log and data file directories in /etc/mongodb.conf.

If you change the user that runs the MongoDB process, you must modify the access control rights to the /var/lib/mongo and /var/log/mongodb directories to give this users access to these directories.

149. Install MongoDB Enterprise on Red Hat Enterprise or CentOS

Packages

MongoDB provides packages of the officially supported MongoDB Enterprise builds in it’s own repository. This repository provides the MongoDB Enterprise distribution in the following packages:

• mongodb-enterprise --This package is a metapackage that will automatically install the four component packages listed below.

• mongodb-enterprise-server-- This package contains the mongod daemon and associated configuration and init scripts.

• mongodb-enterprise-mongos -- This package contains the mongos daemon.

• mongodb-enterprise-shell-- This package contains the mongo shell.

• mongodb-enterprise-tools -- This package contains the following MongoDB tools: mongoimport bsondump, mongodump, mongoexport, mongofiles, mongoimport, mongooplog, mongoperf, mongorestore, mongostat, and mongotop.

Control Scripts

The mongodb-enterprise package includes various control scripts, including the init script /etc/rc.d/init.d/mongod.

The package configures MongoDB using the /etc/mongod.conf file in conjunction with the control scripts.

As of version 2.6.4, there are no control scripts for mongos. The mongos process is used only in sharding.

You can use the mongod init script to derive your own mongos control script.

Considerations

MongoDB only provides Enterprise packages for Red Hat Enterprise Linux and CentOS Linux versions 5 and 6,64-bit.

The default /etc/mongodb.conf configuration file supplied by the 2.6 series packages has bind_ip‘ set to 127.0.0.1 by default. Modify this setting as needed for your environment before initializing a replica set.

Changed in version 2.6: The package structure and names have changed as of version 2.6..

Install MongoDB Enterprise

When you install the packages for MongoDB Enterprise, you choose whether to install the current release or a previous one. This procedure describes how to do both.

Step 1: Configure repository. Create an /etc/yum.repos.d/mongodb-enterprise.repo file so that you can install MongoDB enterprise directly, using yum.

Use the following repository file to specify the latest stable release of MongoDB enterprise.

[mongodb-enterprise]

name=MongoDB Enterprise Repository

baseurl=https://repo.mongodb.com/yum/redhat/$releasever/mongodb-enterprise/stable/$basearch/

gpgcheck=0

enabled=1

Use the following repository to install only versions of MongoDB for the 2.6 release. If you’d like to install MongoDB Enterprise packages from a particular release series, such as 2.4 or 2.6, you can specify the release series in the repository configuration. For example, to restrict your system to the 2.6 release series, create a /etc/yum.repos.d/mongodb-enterprise-2.6.repo file to hold the following configuration information for the MongoDB Enterprise 2.6 repository:

[mongodb-enterprise-2.6]

name=MongoDB Enterprise 2.6 Repository

baseurl=https://repo.mongodb.com/yum/redhat/$releasever/mongodb-enterprise/2.6/$basearch/

gpgcheck=0

enabled=1

.repo files for each release can also be found in the repository itself. Remember that odd-numbered minor release versions (e.g. 2.5) are development versions and are unsuitable for production deployment.

Step 1: Install the MongoDB Enterprise packages and associated tools. You can install either the latest stable version of MongoDB Enterprise or a specific version of MongoDB Enterprise.

Install the latest stable version of MongoDB Enterprise. Issue the following command:

sudo yum install -y mongodb-enterprise

Step 2: Optional. Manage Installed Version

Install a specific release of MongoDB Enterprise. Specify each component package individually and append the version number to the package name, as in the following example that installs the 2.6.1 release of MongoDB:

sudo yum install -y mongodb-enterprise-2.6.1 mongodb-enterprise-server-2.6.1 mongodb-enterprise-shell-Pin a specific version of MongoDB Enterprise. Although you can specify any available version of MongoDB Enterprise, yum will upgrade the packages when a newer version becomes available. To prevent unintended upgrades, pin the package. To pin a package, add the following exclude directive to your /etc/yum.conf file:

exclude=mongodb-enterprise,mongodb-enterprise-server,mongodb-enterprise-shell,mongodb-enterprise-mongos,Previous versions of MongoDB packages use different naming conventions.

Step 3: When the install completes, you can run MongoDB.

Run MongoDB Enterprise

Important: You must configure SELinux to allow MongoDB to start on Red Hat Linux-based systems (Red Hat

Enterprise Linux, CentOS, Fedora). Administrators have three options:

• enable access to the relevant ports for SELinux. For default settings, this can be accomplished by running

semanage port -a -t mongodb_port_t -p tcp 27017

• set SELinux to permissive mode in /etc/selinux.conf. The line

SELINUX=enforcing

should be changed to

SELINUX=permissive

• disable SELinux entirely; as above but set

SELINUX=disabled

All three options require root privileges. The latter two options each requires a system reboot and may have larger implications for your deployment.

150. Explain start & stop process of MongoDB ?

Step 1: Start MongoDB. You can start the mongod process by issuing the following command:

sudo service mongod start

Step 2: Verify that MongoDB has started successfully You can verify that the mongod process has started successfully

by checking the contents of the log file at /var/log/mongodb/mongod.log for a line reading

[initandlisten] waiting for connections on port <port>

where <port> is the port configured in /etc/mongod.conf, 27017 by default.

You can optionally ensure that MongoDB will start following a system reboot by issuing the following command:

sudo chkconfig mongod on

Step 3: Stop MongoDB. As needed, you can stop the mongod process by issuing the following command:

sudo service mongod stop

Step 4: Restart MongoDB. You can restart the mongod process by issuing the following command:

sudo service mongod restart

You can follow the state of the process for errors or important messages by watching the output in the

/var/log/mongodb/mongod.log file.

Step 5: Begin using MongoDB.

151. What is default port of MongoDB and in which configuration file.

Port configured in /etc/mongod.conf, 27017 by default.

152. Which format used to store data in MongoDB ?

MongoDB doesn’t actually use JSON to store the data; rather, it uses an open data format developed by the MongoDB team called BSON (pronounced Bee-Son), which is short for Binary-JSON. BSON makes MongoDB even faster by making it much easier for a computer to process and search documents. BSON also adds a couple of features that aren’t available in standard JSON, including the ability to add types for handling binary data.

153. Explain & Compare JSON & BSON ?

JSON allows complex data structures to be represented in a simple, human-readable text format that is generally considered to be much easier to read and understand than XML. Like XML, JSON was envisaged as a way to exchange data between a web client (such as a browser) and web applications.

BSON is much easier to traverse (i.e., to look through) and index very quickly. Although BSON requires slightly more disk space than JSON, this extra space is unlikely to be a problem because disks are cheap, and MongoDB can scale across machines.

The second key benefit to using BSON is that it is easy and quick to convert BSON to a programming language’s native data format. If the data were stored in pure JSON, a relatively high-level conversion would need to take place. There are MongoDB drivers for a large number of programming languages (such as Python, Ruby, PHP, C, C++ and C#), and each works slightly differently. Using a simple binary format, native data structures can be quickly built for each language, without requiring that you first process JSON. This makes the code simpler and faster, both of which are in keeping with MongoDB’s stated goals.

BSON also provides some extensions to JSON. For example, it enables you to store binary data and incorporates a specific date type. Thus, while BSON can store any JSON document, a valid BSON document may not be valid JSON. This doesn’t matter because each language has its own driver that converts data to and from BSON without needing to use JSON as an intermediary language.

154. Advantages of MongoDB over RDBMS

 Schema less : MongoDB is document database in which one collection holds different different documents.Number of fields, content and size of the document can be differ from one document to another.

 Structure of a single object is clear

 No complex joins

 Deep query-ability. MongoDB supports dynamic queries on documents using a document-based query language that's nearly as powerful as SQL Tuning

 Ease of scale-out: MongoDB is easy to scale

 Conversion / mapping of application objects to database objects not needed

 Uses internal memory for storing the (windowed) working set, enabling faster access of data

155. Feature List of MongoDB ?

Using Document-Orientated Storage (BSON)

Supporting Dynamic Queries

Indexing Your Documents

Leveraging Geospatial Indexes

Profiling Queries

Updating Information In-Place

Storing Binary Data

Replicating Data

Implementing Auto Sharding

Using Map and Reduce Functions

156. Explain Version Numbers of MongoDB

MongoDB uses the “odd-numbered versions for development releases” approach. In other words, you can tell by looking at the second number of the version number (also called the release number) whether a version is a development version or a stable version. If the second number is even, then it’s a stable release. If the second number is an odd number, then it’s an unstable, or development, release.

Let’s take a closer look at the three digits included in a version number’s three parts, A, B, and C:

A, the first (or left-most) number: Represents the major version and only changes

when there is a full version upgrade.

B, the second (or middle) number: Represents the release number and indicates

whether a version is a development version or a stable version. If the number is

even, the version is stable; if the number is odd, then the version is unstable and

considered a development release.

C, the third (or right-most) number: Represents the revision number; this is used

for bugs and security issues.

For example, at the time of writing, the following versions were available from the MongoDB website:

1.6.1 (Production release)

1.4.4 (Previous release)

1.7.0-pre (Development release)

157. Installation Layout of MongoDB ?

After you install or extract MongoDB successfully, you will have the applications shown in Figure available in the bin directory (in both Linux and Windows).

|-- bin

| |-- mongo (the database shell)

| |-- mongod (the core database server)

| |-- mongos (auto-sharding process)

| |-- mongodump (dump/export utility)

| `-- mongorestore (restore/import utility)

The installed software includes five applications that you will be using in conjunction with your MongoDB databases. The two “most important” applications are the mongo and mongod applications.

The mongo application allows you to use the database shell; this shell enables you to accomplish practically anything you’d want to do with MongoDB.

The mongod application starts the service or daemon, as it’s also called. There are also many flags you can set when launching the MongoDB applications.

For example, the service lets you specify the path where the database is located (--dbpath), show version information (--version), and even print some diagnostic system information (with the --sysinfo flag)! You can view the entire list of options by including the --help flag when you launch the service. For now, you can just use the defaults and start the service by typing mongod in your shell or command prompt.

Example of our test server

/usr/bin

--mongotop

--mongostat

--mongorestore

--mongoperf

--mongooplog

--mongoimport

--mongodump

--mongo

--mongos

--mongofiles

--mongoexport

--mongod

158. What is test database in MongoDB ?

If you start the MongoDB service with the default parameters, and start the shell with the default settings, then you will be connected to the default test database running on your local host. This database is created automatically the moment you connect to it. This is one of MongoDB’s most powerful features: if you attempt to connect to a database that does not exist, MongoDB will automatically create it for you.

159. Explain _id Field MongoDB ?

Every object within the MongoDB database contains a unique identifier to distinguish that object from every other object. This unique identifier is called the _id key, and it is added automatically to every document you create in a collection.

The _id key is the first attribute added in each new document you create. This remains true even if you do not tell MongoDB to create this key.

_id is a 12 bytes hexadecimal number which assures the uniqueness of every document. You can provide _id while inserting the document. If you didn't provide then MongoDB provide a unique id for every document.

If you do not specify the _id value manually, then the type will be set to a special BSON datatype that consists of a 12-byte binary value.

The 12-byte value consist of a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter. It’s good to know that the counter and timestamp fields are stored in Big Endian. This is because MongoDB wants to ensure that there is an increasing order to these values, and a Big Endian approach suits this requirement best.

Every additional supported driver that you load when working with MongoDB (such as the PHP driver or the Python driver) supports this special BSON datatype and uses it whenever new data is created.

You can also invoke ObjectId() from the MongoDB shell to create a value for an _id key.

Optionally, you can specify your own value by using ObjectId(string), where string represents the specified hex string.

160. Explain Big Endian and Little Endian ?

Big Endian and Little Endian refer to how each individual bytes/bits are stored in a longer data word in the memory. Big Endian simply means that the highest value gets saved first. Similarly, Little Endian means that the smallest value gets saved first.

161. What are the datatypes can be use in document of MongoDB ?

Possible types of data you can add to a document, and what you use them for:

String: This commonly used datatype contains a string of text (or any other kind of

characters). This datatype is used mostly for storing text values (e.g., "Country" :

"Japan"}.

Integer (32b and 64b): This type is used to store a numerical value (e.g., { "Rank" : 1 } ). Note that there are no quotes placed before or after the integer.

Boolean: This datatype can be set to either TRUE or FALSE.

Double: This datatype is used to store floating point values.

Min / Max keys: This datatype is used to compare a value against the lowest and

highest BSON elements, respectively.

Arrays: This datatype is used to store arrays (e.g., ["Membrey, Peter","Plugge,

Eelco","Hawkins, Tim"]).

Timestamp: This datatype is used to store a timestamp. This can be handy for

recording when a document has been modified or added.

Object: This datatype is used for embedded documents.

Null: This datatype is used for a Null value.

Symbol: This datatype is used identically to a string (see above); however, it’s

generally reserved for languages that use a specific symbol type.

Date *: This datatype is used to store the current date or time in UNIX time format

(POSIX time).

Object ID *: This datatype is used to store the document’s ID.

Binary data *: This datatype is used to store binary data.

Regular expression *: This datatype is used for regular expressions. All options are represented by specific characters provided in alphabetical order.

JavaScript Code *: This datatype is used for JavaScript code.

The last five datatypes (date, object id, binary data, regex, and JavaScript code) are non-JSON

datatypes; specifically, they are special datatypes that BSON allows you to use.

162. File systems snapshots for MONGODB backup?

File systems snapshots are an operating system volume manager feature, and are not specific to MongoDB. The mechanics of snapshots depend on the underlying storage system. For example, if you use Amazon’s EBS storage system for EC2 supports snapshots. On Linux the LVM manager can create a snapshot.

To get a correct snapshot of a running mongod process, you must have journaling enabled and the journal must reside on the same logical volume as the other MongoDB data files. Without journaling enabled, there is no guarantee that

the snapshot will be consistent or valid.

To get a consistent snapshot of a sharded system, you must disable the balancer and capture a snapshot from every shard and a config server at approximately the same moment in time.

163. Backup with mongodump. Also mention pros & cond ?

The mongodump tool reads data from a MongoDB database and creates high fidelity BSON files. The mongorestore tool can populate a MongoDB database with the data from these BSON files. These tools are simple and efficient for backing up small MongoDB deployments, but are not ideal for capturing backups of larger systems.

mongodump and mongorestore can operate against a running mongod process, and can manipulate the underlying data files directly. By default, mongodump does not capture the contents of the local database.

--mongodump only captures the documents in the database. The resulting backup is space efficient, but mongorestore or mongod must rebuild the indexes after restoring data.

--When connected to a MongoDB instance, mongodump can adversely affect mongod performance. If your data is larger than system memory, the queries will push the working set out of memory.

To mitigate the impact of mongodump on the performance of the replica set, use mongodump to capture backups from a secondary member of a replica set. Alternatively, you can shut down a secondary and use mongodump with the data files directly. If you shut down a secondary to capture data with mongodump ensure that the operation can complete before its oplog becomes too stale to continue replicating.

For replica sets, mongodump also supports a point in time feature with the --oplog option. Applications may continue modifying data while mongodump captures the output. To restore a point in time backup created with --oplog, use mongorestore with the --oplogReplay option.

If applications modify data while mongodump is creating a backup, mongodump will compete for resources with those applications.

164. MongoDB Reporting Tools

This section provides an overview of the reporting methods distributed with MongoDB. Utilities The MongoDB distribution includes a number of utilities that quickly return statistics about instances’ performance and activity. Typically, these are most useful for diagnosing issues and assessing normal operation.

mongostat mongostat captures and returns the counts of database operations by type (e.g. insert, query, update, delete, etc.). These counts report on the load distribution on the server. Use mongostat to understand the distribution of operation types and to inform capacity planning.

mongotop mongotop tracks and reports the current read and write activity of a MongoDB instance, and reports these statistics on a per collection basis.

Use mongotop to check if your database activity and use match your expectations.

REST Interface MongoDB provides a simple REST interface that can be useful for configuring monitoring and alert scripts, and for other administrative tasks.

To enable, configure mongod to use REST, either by starting mongod with the --rest option, or by setting the net.http.RESTInterfaceEnabled setting to true in a configuration file.

HTTP Console MongoDB provides a web interface that exposes diagnostic and monitoring information in a simple web page. The web interface is accessible at localhost:<port>, where the <port> number is 1000 more than the mongod port .

For example, if a locally running mongod is using the default port 27017, access the HTTP console at http://localhost:28017.

Commands MongoDB includes a number of commands that report on the state of the database.

These data may provide a finer level of granularity than the utilities discussed above. Consider using their output in scripts and programs to develop custom alerts, or to modify the behavior of your application in response to the activity of your instance. The db.currentOp method is another useful tool for identifying the database instance’s in-progress operations.

serverStatus The serverStatus command, or db.serverStatus() from the shell, returns a general overview of the status of the database, detailing disk usage, memory use, connection, journaling, and index access.

The command returns quickly and does not impact MongoDB performance.

serverStatus outputs an account of the state of a MongoDB instance. This command is rarely run directly. In most cases, the data is more meaningful when aggregated, as one would see with monitoring tools including MMS12 .

Nevertheless, all administrators should be familiar with the data provided by serverStatus.

dbStats The dbStats command, or db.stats() from the shell, returns a document that addresses storage use and data volumes. The dbStats reflect the amount of storage used, the quantity of data contained in the database, and object, collection, and index counters.

Use this data to monitor the state and storage capacity of a specific database. This output also allows you to compare use between databases and to determine the average document size in a database.

collStats The collStats provides statistics that resemble dbStats on the collection level, including a count of the objects in the collection, the size of the collection, the amount of disk space used by the collection, and information about its indexes.

replSetGetStatus The replSetGetStatus command (rs.status() from the shell) returns an overview of your replica set’s status. The replSetGetStatus document details the state and configuration of the replica set and statistics about its members. Use this data to ensure that replication is properly configured, and to check the connections between the current host and the other members of the replica set.

Third Party Tools A number of third party monitoring tools have support for MongoDB, either directly, or through their own plugins.

165.Run Multiple Database Instances on the Same System

In many cases running multiple instances of mongod on a single system is not recommended. On some types of deployments 60 and for testing purposes you may need to run more than one mongod on a single system.

In these cases, use a base configuration for each instance, but consider the following configuration values:

dbpath = /srv/mongodb/db0/

pidfilepath = /srv/mongodb/db0.pid

The dbPath value controls the location of the mongod instance’s data directory. Ensure that each database has a distinct and well labeled data directory. The pidFilePath controls where mongod process places it’s process id file. As this tracks the specific mongod file, it is crucial that file be unique and well labeled to make it easy to start and stop these processes.

Create additional control scripts and/or adjust your existing MongoDB configuration and control script as needed to control these processes.

166. What is Diagnostic Configurations for performance issue.

The following configuration options control various mongod behaviors for diagnostic purposes. The following settings have default values that tuned for general production purposes:

slowms = 50

profile = 3

verbose = true

objcheck = true

Use the base configuration and add these options if you are experiencing some unknown issue or performance problem as needed:

• slowOpThresholdMs configures the threshold for to consider a query “slow,” for the purpose of the logging system and the database profiler. The default value is 100 milliseconds. Set a lower value if the database profiler does not return useful results, or a higher value to only log the longest running queries.

• mode sets the database profiler level. The profiler is not active by default because of the possible impact on the profiler itself on performance. Unless this setting has a value, queries are not profiled.

• verbosity controls the amount of logging output that mongod write to the log. Only use this option if you are experiencing an issue that is not reflected in the normal logging level.

• wireObjectCheck forces mongod to validate all requests from clients upon receipt. Use this option to ensure that invalid requests are not causing errors, particularly when running a database with untrusted clients.

This option may affect database performance.

167. MongoDB Performance Monitoring by OS commands ?

iostat On Linux, use the iostat command to check if disk I/O is a bottleneck for your database. Specify a number of seconds when running iostat to avoid displaying stats covering the time since server boot.

For example, the following command will display extended statistics and the time for each displayed report, with traffic in MB/s, at one second intervals:

iostat -xmt 1

Key fields from iostat:

• %util: this is the most useful field for a quick check, it indicates what percent of the time the device/drive is in use.

• avgrq-sz: average request size. Smaller numbers for this value reflect more random IO operations.

bwm-ng bwm-ng71 is a command-line tool for monitoring network use. If you suspect a network-based bottleneck, you may use bwm-ng to begin your diagnostic process.

168. What is Connection Pools and use?

To avoid overloading the connection resources of a single mongod or mongos instance, ensure that clients maintain reasonable connection pool sizes.

The connPoolStats database command returns information regarding the number of open connections to the current database for mongos instances and mongod instances in sharded clusters.

169. Is it authorization enabled in MongoDB ?

By default, authorization is not enabled and mongod assumes a trusted environment. You can enable security/auth mode if you need it.

170. Collection Export with mongoexport

With the mongoexport utility you can create a backup file. In the most simple invocation, the command takes the following form:

mongoexport --collection collection --out collection.json

This will export all documents in the collection named collection into the file collection.json. Without the output specification (i.e. “--out collection.json”), mongoexport writes output to standard output (i.e. “stdout”). You can further narrow the results by supplying a query filter using the “--query” and limit results to a single database using the “--db” option. For instance:

mongoexport --db sales --collection contacts --query '{"field": 1}'

This command returns all documents in the sales database’s contacts collection, with a field named field with a value of 1. Enclose the query in single quotes (e.g. ’) to ensure that it does not interact with your shell environment. The resulting documents will return on standard output.

By default, mongoexport returns one JSON document per MongoDB document. Specify the “--jsonArray” argument to return the export as a single JSON array. Use the “--csv” file to return the result in CSV (comma separated values) format.

If your mongod instance is not running, you can use the “--dbpath” option to specify the location to your MongoDB instance’s database files. See the following example:

mongoexport --db sales --collection contacts --dbpath /srv/MongoDB/

This reads the data files directly. This locks the data directory to prevent conflicting writes. The mongod process must not be running or attached to these data files when you run mongoexport in this configuration.

The “--host” and “--port” options allow you to specify a non-local host to connect to capture the export. Consider the following example:

mongoexport --host mongodb1.example.net --port 37017 --username user --password pass --collection contacts

On any mongoexport command you may, as above specify username and password credentials as above.

171. Collection Import with mongoimport

To restore a backup taken with mongoexport. Most of the arguments to mongoexport also exist for mongoimport. Consider the following command:

mongoimport --collection collection --file collection.json

This imports the contents of the file collection.json into the collection named collection. If you do not specify a file with the “--file” option, mongoimport accepts input over standard input (e.g. “stdin.”)

If you specify the “--upsert” option, all of mongoimport operations will attempt to update existing documents in the database and insert other documents. This option will cause some performance impact depending on your configuration.

You can specify the database option --db to import these documents to a particular database. If your MongoDB instance is not running, use the “--dbpath” option to specify the location of your MongoDB instance’s database files. Consider using the “--journal” option to ensure that mongoimport records its operations in the journal.

The mongod process must not be running or attached to these data files when you run mongoimport in this configuration.

Use the “--ignoreBlanks” option to ignore blank fields. For CSV and TSV imports, this option provides the desired functionality in most cases: it avoids inserting blank fields in MongoDB documents.

172. What are parts of GridFS?

GridFS consists of two parts. More specifically, it consists of two collections. One collection holds the filename and related information such as size (called metadata), while the other collection holds the file data itself, usually in 256k chunks. The specification calls for these to be named files and chunks respectively. By default, the files and chunks collections are created in the fs namespace, but this can be

changed. The ability to change the default namespace is useful if you want to store different types of files. For example, you might want to keep image and movie files separate.

173. How to limit number of items added into capped collection ? how works ?

You can also limit the number of items added into a capped collection using the max: parameter when you create the collection. However, you must take care that you ensure that there is enough space in the collection for the number of items you want to add. If the collection becomes full before the number of items has been reached, the oldest item in the collection will be removed.

The MongoDB shell includes a utility that lets you see the amount of space used by an existing collection, whether it’s capped or uncapped. You invoke this utility using the validate() function. This can be particularly useful if you want to estimate how large a collection might become.

174. What are the limitation with capped collection regarding update and delete operation ?

Documents already added to a capped collection can be updated, but they must not grow in size. The update will fail if they do.

Deleting documents from a capped collection is also not possible; instead, the entire

collection must be dropped and re-created if you want to do this.

Manoj Simar - The Technology Expert

About Me

Thursday, 15 November 2018

Interview Q and A for MongoDB Part - 3

No comments:

Post a Comment