61. When should I embed documents within other documents?
When modeling
data in MongoDB, embedding is frequently the choice for:
• “contains”
relationships between entities.
• one-to-many
relationships when the “many” objects always appear with or are viewed in the
context of their parents.
You should also
consider embedding for performance reasons if you have a collection with a
large number of small documents. Nevertheless, if small, separate documents
represent the natural model for the data, then you should maintain that model.
If, however,
you can group these small documents by some logical relationship and you
frequently retrieve the documents by this grouping, you might consider
“rolling-up” the small documents into larger documents that contain an array of
subdocuments. Keep in mind that if you often only need to retrieve a subset of
the documents within the group, then “rolling-up” the documents may not provide
better performance.
“Rolling up”
these small documents into logical groupings means that queries to retrieve a
group of documents involve sequential reads and fewer random disk accesses.
Additionally,
“rolling up” documents and moving common fields to the larger document benefit
the index on these fields. There would be fewer copies of the common fields and
there would be fewer associated key entries in the
corresponding index.
62. Can I manually pad documents to prevent
moves during updates?
An update can cause a document to
move on disk if the document grows in size. To minimize document movements,
MongoDB uses padding.
You should not have to pad manually because MongoDB adds
padding automatically and can adaptively adjust the amount of padding added to
documents to prevent document relocations following updates. You can change the
default paddingFactor calculation by using the collMod command with the
usePowerOf2Sizes flag.
The usePowerOf2Sizes flag ensures that MongoDB allocates
document space in sizes that are powers of 2, which helps ensure that MongoDB
can efficiently reuse free space created by document deletion or relocation.
However, if you must pad a document manually, you can add a
temporary field to the document and then $unset the field, as in the following
example.
Warning:
Do not manually pad documents in a capped collection. Applying manual padding
to a document in a capped collection can break replication. Also, the padding
is not preserved if you re-sync the MongoDB instance.
var myTempPadding = [
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"];
db.myCollection.insert( { _id: 5, paddingField:
myTempPadding } );
db.myCollection.update( { _id: 5 },
{ $unset: { paddingField: "" } }
)
db.myCollection.update( { _id: 5 },
{ $set: { realField: "Some text that I might have
needed padding for" } }
)
63. How can I enter
multi-line operations in the mongo shell?
If you end a line with an open parenthesis (’(’), an open
brace (’{’), or an open bracket (’[’), then the subsequent lines start with
ellipsis ("...") until you enter the corresponding closing
parenthesis (’)’), the closing brace (’}’) or the closing bracket (’]’). The
mongo shell waits for the closing parenthesis, closing brace, or the closing
bracket before evaluating the code, as in the following example:
> if ( x > 0 ) {
... count++;
... print (x);
... }
You can exit the line continuation mode if you enter two
blank lines, as in the following example:
> if (x > 0
...
...
>
64. How can I access
different databases temporarily?
You can use db.getSiblingDB() method to access another
database without switching databases, as in the following example which first
switches to the test database and then accesses the sampleDB database from the
test database:
use test
db.getSiblingDB('sampleDB').getCollectionNames();
65. Does the mongo
shell support tab completion and other keyboard shortcuts?
The mongo shell supports keyboard shortcuts. For example,
• Use the up/down arrow keys to scroll through command
history.
• Use <Tab> to autocomplete or to list the completion
possibilities, as in the following example which uses <Tab> to complete
the method name starting with the letter ’c’:
db.myCollection.c<Tab>
Because there are many collection methods starting with the
letter ’c’, the <Tab> will list the various methods that start with ’c’.
66. How can I
customize the mongo shell prompt?
New in version 1.9.
You can change the mongo shell prompt by setting the prompt
variable. This makes it possible to display additional information in the
prompt.
Set prompt to any string or arbitrary JavaScript code that
returns a string, consider the following examples:
• Set the shell prompt to display the hostname and the
database issued:
var host = db.serverStatus().host;
var prompt = function() { return
db+"@"+host+"> "; }
The mongo shell prompt should now reflect the new prompt:
test@my-machine.local>
• Set the shell prompt to display the database statistics:
var prompt = function() {
return "Uptime:"+db.serverStatus().uptime+"
Documents:"+db.stats().objects+" > ";
}
The mongo shell prompt should now reflect the new prompt:
Uptime:1052 Documents:25024787 >
You can add the logic
for the prompt in the .mongorc.js file to set the prompt each time you start up
the mongo shell.
67. Can I edit long
shell operations with an external text editor?
You can use your own editor in the mongo shell by setting
the EDITOR environment variable before starting the mongo shell. Once in the
mongo shell, you can edit with the specified editor by typing edit
<variable> or edit <function>, as in the following example:
1. Set the EDITOR variable from the command line prompt:
EDITOR=vim
2. Start the mongo shell:
mongo
3. Define a function myFunction:
function myFunction () { }
4. Edit the function using your editor:
edit myFunction
The command should open the vim edit session. Remember to
save your changes.
5. Type myFunction to see the function definition:
myFunction
The result should be the changes from your saved edit:
function myFunction() {
print("This was edited");
}
68. What type of
locking does MongoDB use?
MongoDB uses a readers-writer 18 lock that allows concurrent
reads access to a database but gives exclusive access to a single write
operation.
When a read lock exists, many read operations may use this
lock. However, when a write lock exists, a single write operation holds the
lock exclusively, and no other read or write operations may share the lock.
Locks are “writer greedy,” which means write locks have
preference over reads. When both a read and write are waiting for a lock,
MongoDB grants the lock to the write.
69. How granular are
locks in MongoDB?
Beginning with version 2.2, MongoDB implements locks on a
per-database basis for most read and write operations. Some global operations,
typically short lived operations involving multiple databases, still require a
global “instance” wide lock.
Before 2.2, there is only one “global” lock per mongod
instance.
For example, if you have six databases and one takes a
database-level write lock, the other five are still available for read and
write. A global lock makes all six databases unavailable during the operation.
70. How do I see the
status of locks on my mongod instances?
For reporting on lock utilization information on locks, use
any of the following methods:
• db.serverStatus(),
• db.currentOp(),
• mongotop,
• mongostat, and/or
• the MongoDB Management Service (MMS)
Specifically, the locks document in the output of
serverStatus, or the locks field in the current operation reporting provides
insight into the type of locks and amount of lock contention in your mongod
instance.
To terminate an operation, use db.killOp().
71. Does a read or
write operation ever yield the lock?
In some situations, read and write operations can yield
their locks.
Long running read and write operations, such as queries,
updates, and deletes, yield under many conditions. MongoDB uses an adaptive
algorithms to allow operations to yield locks based on predicted disk access patterns
(i.e. page faults.)
MongoDB operations can also yield locks between individual
document modification in write operations that affect multiple documents like
update() with the multi parameter.
MongoDB uses heuristics based on its access pattern to predict
whether data is likely in physical memory before performing a read. If MongoDB
predicts that the data is not in physical memory an operation will yield its
lock while MongoDB loads the data to memory. Once data is available in memory,
the operation will reacquire the lock to complete the operation.
Changed in version 2.6: MongoDB does not yield locks when
scanning an index even if it predicts that the index is not in memory.
72. Which operations
lock the database?
Changed in version 2.2.
The following table lists common database operations and the
types of locks they use.
Operation Lock Type
Issue a query Read lock
Get more data
from a cursor
Read lock
Insert data Write lock
Remove data Write lock
Update data Write lock
Map-reduce Read lock and write lock, unless operations are
specified as non-atomic. Portions of map-reduce jobs can run concurrently.
Create an index Building an index in the foreground, which
is the default, locks the database for extended periods of time.
db.eval() Write lock. The db.eval() method takes a global
write lock while evaluating the JavaScript function. To avoid taking this
global write lock, you can use the eval command with
nolock: true.
eval Write lock. By default, eval command takes a global
write lock while evaluating the JavaScript function. If used with nolock: true,
the eval command does not take a global write lock while evaluating the
JavaScript function. However, the logic within the JavaScript function may take
write locks for write operations.
aggregate() Read lock
73. Which administrative commands lock the database?
Certain
administrative commands can exclusively lock the database for extended periods
of time. In some deployments,for large databases, you may consider taking the
mongod instance offline so that clients are not affected. For example, if a
mongod is part of a replica set, take the mongod offline and let other members
of the set service load while maintenance is in progress.
The following
administrative operations require an exclusive (i.e. write) lock on the
database for extended periods:
•
db.collection.ensureIndex(), when issued without setting background to true,
• reIndex,
• compact,
•
db.repairDatabase(),
•
db.createCollection(), when creating a very large (i.e. many gigabytes) capped
collection,
•
db.collection.validate(), and
•
db.copyDatabase(). This operation may lock all databases.
The following
administrative commands lock the database but only hold the lock for a very
short time:
•
db.collection.dropIndex(),
•
db.getLastError(),
• db.isMaster(),
• rs.status()
(i.e. replSetGetStatus),
•
db.serverStatus(),
• db.auth(),
and
• db.addUser().
74. Does a MongoDB
operation ever lock more than one database?
The
following MongoDB operations lock multiple databases:
•
db.copyDatabase() must lock the entire mongod instance at once.
•
db.repairDatabase() obtains a global write lock and will block other operations
until it finishes.
• Journaling,
which is an internal operation, locks all databases for short intervals. All
databases share a single journal.
• User
authentication requires a read lock on the admin database for deployments using
2.6 user credentials. For deployments using the 2.4 schema for user
credentials, authentication locks the admin database as well as the database
the user is accessing.
• All writes to
a replica set’s primary lock both the database receiving the writes and then
the local database for a short time. The lock for the local database allows the
mongod to write to the primary’s oplog and accounts for a small portion of the
total time of the operation.
75. How does sharding affect concurrency?
Sharding
improves concurrency by distributing collections over multiple mongod
instances, allowing shard servers (i.e. mongos processes) to perform any number
of operations concurrently to the various downstream mongod instances.
Each mongod
instance is independent of the others in the shard cluster and uses the MongoDB
readers-writer lock. The operations on one mongod instance do not block the
operations on any others.
76. How does concurrency affect a replica set primary?
In replication,
when MongoDB writes to a collection on the primary, MongoDB also writes to the
primary’s oplog, which is a special collection in the local database.
Therefore, MongoDB must lock both the collection’s database and the local
database. The mongod must lock both databases at the same time to keep the
database consistent and ensure that write operations, even with replication,
are “all-or-nothing” operations.
77. How does concurrency affect secondaries?
In replication,
MongoDB does not apply writes serially to secondaries. Secondaries collect
oplog entries in batches and then apply those batches in parallel. Secondaries
do not allow reads while applying the write operations, and apply
write
operations in the order that they appear in the oplog.
MongoDB can
apply several writes in parallel on replica set secondaries, in two phases:
1. During the
first prefer phase, under a read lock, the mongod ensures that all documents
affected by the operations are in memory. During this phase, other clients may
execute queries against this member.
2. A thread pool using write locks applies all write
operations in the batch as part of a coordinated write phase.
78.
What kind of concurrency does MongoDB provide for JavaScript operations?
Changed in version 2.4: The V8 JavaScript engine added in
2.4 allows multiple JavaScript operations to run at the same time. Prior to
2.4, a single mongod could only run a single JavaScript operation at once.
79. Can I change the
shard key after sharding a collection?
No. There is no automatic support in MongoDB for changing a
shard key after sharding a collection. This reality underscores the importance
of choosing a good shard key. If you must change a shard key after sharding a
collection, the best option is to:
• dump all data from MongoDB into an external format.
• drop the original sharded collection.
• configure sharding using a more ideal shard key.
• pre-split the shard key range to ensure initial even
distribution.
• restore the dumped data into MongoDB.
80. What happens to
unsharded collections in sharded databases?
In the current implementation, all databases in a sharded
cluster have a “primary shard.” All unsharded collection within that database
will reside on the same shard.
81. How does MongoDB
distribute data across shards?
Sharding must be specifically enabled on a collection. After
enabling sharding on the collection, MongoDB will assign various ranges of
collection data to the different shards in the cluster. The cluster
automatically corrects imbalances between shards by migrating ranges of data
from one shard to another.
82. What happens if a
client updates a document in a chunk during a migration?
The mongos routes the operation to the “old” shard, where it
will succeed immediately. Then the shard mongod instances will replicate the
modification to the “new” shard before the sharded cluster updates that chunk’s
“ownership,”
which effectively finalizes the migration process.
83. How does MongoDB
distribute queries among shards?
Changed in version 2.0.
The exact method for distributing queries to shards in a
cluster depends on the nature of the query and the configuration of the sharded
cluster. Consider a sharded collection, using the shard key user_id, that has
last_login and email attributes:
• For a query that selects one or more values for the
user_id key:
mongos determines which shard or shards contains the
relevant data, based on the cluster metadata, and directs a query to the required shard or shards, and
returns those results to the client.
• For a query that selects user_id and also performs a sort:
mongos can make a straightforward translation of this
operation into a number of queries against the relevant shards, ordered by
user_id. When the sorted queries return from all shards, the mongos merges the
sorted results and returns the complete result to the client.
• For queries that select on last_login:
These queries must run on all shards: mongos must
parallelize the query over the shards and perform a mergesort on the email of
the documents found.
84. How does MongoDB
sort queries in sharded environments?
If you call the cursor.sort() method on a query in a sharded
environment, the mongod for each shard will sort . its results, and the mongos
merges each shard’s results before returning them to the client.
85. How does MongoDB
ensure unique _id field values when using a shard
key other than _id?
If you do not use _id as the shard key, then your
application/client layer must be responsible for keeping the _id field unique.
It is problematic for collections to have duplicate _id values.If you’re not
sharding your collection by the _id field, then you should be sure to store a
globally unique identifier in that field. The default BSON ObjectId works well in this case.
86. I’ve enabled
sharding and added a second shard, but all the data is still on one server.
Why?
First, ensure that you’ve declared a shard key for your
collection. Until you have configured the shard key, MongoDB will not create
chunks, and sharding will not occur. Next, keep in mind that the default chunk
size is 64 MB. As a result, in most situations, the collection needs to have at
least 64 MB of data before a migration will occur.
Additionally, the system which balances chunks among the
servers attempts to avoid superfluous migrations. Depending on the number of
shards, your shard key, and the amount of data, systems often require at least
10 chunks of data to trigger migrations.
You can run db.printShardingStatus()
to see all the chunks present in your cluster.
87. How does mongos
use connections?
Each client maintains a connection to a mongos instance.
Each mongos instance maintains a pool of connections to the members of a
replica set supporting the sharded cluster. Clients use connections between
mongos and mongod instances one at a time. Requests are not multiplexed or
pipelined. When client requests complete, the mongos returns the connection to
the pool.
88. Why does mongos
hold connections open?
mongos uses a set of connection pools to communicate with
each shard. These pools do not shrink when the number of clients decreases.
This can lead to an unused mongos with a large number of
open connections. If the mongos is no longer in use, it is safe to restart the
process to close existing connections.
89. What does
writebacklisten in the log mean?
The writeback listener is a process that opens a long poll
to relay writes back from a mongod or mongos after migrations to make sure they
have not gone to the wrong server. The writeback listener sends writes back to
the correct server if necessary.
These messages are a key part of the sharding infrastructure
and should not cause concern.
90. How should
administrators deal with failed migrations?
Failed migrations require no administrative intervention.
Chunk migrations always preserve a consistent state. If a migration fails to
complete for some reason, the cluster retries the operation. When the migration
completes successfully,
the data resides only on the new shard.
91. When do the mongos
servers detect config server changes?
mongos instances maintain a cache of the config database
that holds the metadata for the sharded cluster. This metadata includes the
mapping of chunks to shards.
mongos updates its cache lazily by issuing a request to a
shard and discovering that its metadata is out of date. There is no way to
control this behavior from the client, but you can run the flushRouterConfig command against any mongos to
force it to refresh its cache.
92. Is it possible to
quickly update mongos servers after updating a replica set configuration?
The mongos instances will detect these changes without
intervention over time. However, if you want to force the mongos to reload its
configuration, run the flushRouterConfig
command against to each mongos directly.
93. What does the
maxConns setting on mongos do?
The maxIncomingConnections option limits the number of
connections accepted by mongos.If your client driver or application creates a
large number of connections but allows them to time out rather than closing
them explicitly, then it might make sense to limit the number of connections at
the mongos layer.
Set maxIncomingConnections to a value slightly higher than
the maximum number of connections that the client creates, or the maximum size
of the connection pool. This setting prevents the mongos from causing
connection spikes on the individual shards. Spikes like these may disrupt the
operation and memory allocation of the sharded cluster.
94. How do indexes
impact queries in sharded systems?
If the query does not include the shard key, the mongos must
send the query to all shards as a “scatter/gather” operation. Each shard will, in turn, use
either the shard key index or another more efficient index to fulfill the
query.
If the query includes multiple sub-expressions that
reference the fields indexed by the shard key and the secondary index, the
mongos can route the queries to a specific shard and the shard will use the
index that will allow it to fulfill most efficiently.
95. Can shard keys be
randomly generated?
Shard keys can be random. Random keys ensure optimal
distribution of data across the cluster. Sharded clusters, attempt to route
queries to specific shards when queries include the shard key as a parameter,
because these directed queries are more efficient. In many cases, random keys
can make it difficult to direct queries to specific shards.
96. Can shard keys
have a non-uniform distribution of values?
Yes. There is no requirement that documents be evenly distributed
by the shard key. However, documents that have the same shard key must reside
in the same chunk and therefore on the same server. If your sharded data set
has too many documents with the exact same shard key you will not be able to
distribute those
documents across your sharded cluster.
97. Can you shard on
the _id field?
You can use any field for the shard key. The _id field is a
common shard key.
Be aware that ObjectId() values, which are the default value
of the _id field, increment as a timestamp. As a result, when used as a shard
key, all new documents inserted into the collection will initially belong to
the same chunk
on a single shard. Although the system will eventually
divide this chunk and migrate its contents to distribute data more evenly ,at
any moment the cluster can only direct insert operations at a single shard.
This can limit the throughput of inserts. If most of your write operations are
updates, this limitation should not impact your performance. However, if you
have a high insert volume, this may be a limitation.
To address this issue, MongoDB 2.4 provides hashed shard
keys.
98. What do moveChunk
commit failed errors mean?
At the end of a chunk migration, the shard must connect to
the config database to update the chunk’s record in the cluster metadata. If
the shard fails to connect to the config database, MongoDB reports the
following error:
ERROR: moveChunk commit failed: version is at
<n>|<nn> instead of
<N>|<NN>" and "ERROR:
TERMINATING"
When this happens, the primary member of the shard’s replica
set then terminates to protect data consistency. If a secondary member can
access the config database, data on the shard becomes accessible again after an
election.The user will need to resolve the chunk migration failure independently.
99. How does draining
a shard affect the balancing of uneven chunk distribution?
The sharded cluster balancing process controls both
migrating chunks from decommissioned shards (i.e. draining) and normal cluster
balancing activities. Consider the following behaviors for different versions
of MongoDB in situations
where you remove a shard in a cluster with an uneven chunk
distribution:
• After MongoDB 2.2, the balancer first removes the chunks
from the draining shard and then balances the remaining uneven chunk
distribution.
• Before MongoDB 2.2, the balancer handles the uneven chunk
distribution and then removes the chunks from the draining shard.
100. What kinds of
replication does MongoDB support?
MongoDB supports master-slave replication and a variation on
master-slave replication known as replica sets. Replica sets are the
recommended replication topology.
101. Does replication
work over the Internet and WAN connections?
Yes. For example, a deployment may maintain a primary and
secondary in an East-coast data center along with a secondary member for
disaster recovery in a West-coast data center.
102. Can MongoDB
replicate over a “noisy” connection?
Yes, but not without connection failures and the obvious
latency.
Members of the set will attempt to reconnect to the other
members of the set in response to networking flaps. This does not require
administrator intervention. However, if the network connections among the nodes
in the replica set are very slow, it might not be possible for the members of
the node to keep up with the replication.
If the TCP connection between the secondaries and the
primary instance breaks, a replica set will automatically elect one of the
secondary members of the set as primary.
103. What is the
preferred replication method: master/slave or replica sets?
New in version 1.8. Replica sets are the preferred
replication mechanism in MongoDB. However, if your deployment requires more
than 12 nodes, you must use master/slave replication.
104. What is the
preferred replication method: replica sets or replica pairs?
Deprecated since version 1.6. Replica sets replaced replica
pairs in version 1.6. Replica sets are the preferred replication mechanism in
MongoDB.
105. Why use
journaling if replication already provides data redundancy?
Journaling facilitates faster crash recovery. Prior to
journaling, crashes often required database repairs or full data resync. Both
were slow, and the first was unreliable. Journaling is particularly useful for
protection against power failures, especially if your replica set resides in a
single data center or power circuit.
When a replica set runs with journaling, mongod instances
can safely restart without any administrator intervention.
Note: Journaling requires some resource overhead for write
operations. Journaling has no effect on read performance, however.
Journaling is enabled by default on all 64-bit builds of
MongoDB v2.0 and greater.
106. Are write
operations durable if write concern does not acknowledge
writes?
Yes. However, if you want confirmation that a given write
has arrived at the server, use write concern. After the default write concern
change, the default write concern acknowledges all write operations, and
unacknowledged writes must be explicitly configured.
Changed in version 2.6: The mongo shell now defaults to use
safe writes.
A new protocol for write operations integrates write
concerns with the write operations. Previous versions issued a getLastError command after a write to specify a
write concern.
107. How many arbiters
do replica sets need?
Some configurations do not require any arbiter instances.
Arbiters vote in elections for primary but do not replicate the data like
secondary members.
Replica sets require a majority of the remaining nodes
present to elect a primary. Arbiters allow you to construct this majority
without the overhead of adding replicating nodes to the system.
There are many possible replica set architectures .
A replica set with an odd number of voting nodes does not need
an arbiter.
A common configuration consists of two replicating nodes
that include a primary and a secondary, as well as an arbiter for the third
node. This configuration makes it possible for the set to elect a primary in
the event of failure, without requiring three replicating nodes.
You may also consider adding an arbiter to a set if it has
an equal number of nodes in two facilities and network partitions between the
facilities are possible. In these cases, the arbiter will break the tie between
the two facilities and allow the set to elect a new primary.
108. What information
do arbiters exchange with the rest of the replica set?
Arbiters never receive the contents of a collection but do
exchange the following data with the rest of the replica set:
• Credentials used to authenticate the arbiter with the
replica set. All MongoDB processes within a replica set use keyfiles. These
exchanges are encrypted.
• Replica set configuration data and voting data. This
information is not encrypted. Only credential exchanges are encrypted.
If your MongoDB deployment uses SSL, then all communications
between arbiters and the other members of the replica set are secure. Run all
arbiters on secure networks, as with all MongoDB components.
109. Which members of
a replica set vote in elections?
All members of a replica set, unless the value of votes is
equal to 0, vote in elections. This includes all delayed, hidden and
secondary-only members, as well as the
arbiters.
Additionally, the state of the voting members also determine
whether the member can vote. Only voting members in the following states are
eligible to vote:
• PRIMARY
• SECONDARY
• RECOVERING
• ARBITER
• ROLLBACK
110. Do hidden members
vote in replica set elections?
Hidden members of replica sets do vote in elections. To
exclude a member from voting in an election, change the value of the member’s
votes configuration to 0.
111. Is it normal for
replica set members to use different amounts of disk space?
Yes. Factors
including: different oplog sizes, different levels of storage fragmentation,
and MongoDB’s data file preallocation can lead to some variation in storage
utilization between nodes. Storage use disparities will be most pronounced
when you add members at different times.
112. What are memory
mapped files?
A memory-mapped file is a file with data that the operating
system places in memory by way of the mmap() system call. mmap() thus maps the
file to a region of virtual memory. Memory-mapped files are the critical piece
of the storage engine in MongoDB. By using memory mapped files MongoDB can
treat the contents of its data files as if they were in memory. This provides
MongoDB with an extremely fast and simple method for accessing and manipulating
data.
113. How do memory
mapped files work?
Memory mapping assigns files to a block of virtual memory
with a direct byte-for-byte correlation. Once mapped, the relationship between
file and memory allows MongoDB to interact with the data in the file as if it
were memory.
114. How does MongoDB
work with memory mapped files?
MongoDB uses memory mapped files for managing and
interacting with all data. MongoDB memory maps data files to memory as it
accesses documents. Data that isn’t accessed is not mapped to memory.
115. What are page
faults?
Page faults can occur as MongoDB reads from or writes data
to parts of its data files that are not currently located in physical memory.
In contrast, operating system page faults happen when physical memory is
exhausted and pages of physical memory are swapped to disk.
If there is free memory, then the operating system can find
the page on disk and load it to memory directly. However, if there is no free
memory, the operating system must:
• find a page in memory that is stale or no longer needed,
and write the page to disk.
• read the requested page from disk and load it into memory.
This process, particularly on an active system can take a
long time, particularly in comparison to reading a page that is already in
memory.
116. What is the
difference between soft and hard page faults?
Page faults occur when MongoDB needs access to data that
isn’t currently in active memory. A “hard” page fault refers to situations when
MongoDB must access a disk to access the data. A “soft” page fault, by
contrast, merely moves memory pages from one list to another, such as from an
operating system file cache. In production, MongoDB will rarely encounter soft
page faults.
117. What tools can I
use to investigate storage use in MongoDB?
The db.stats() method
in the mongo shell, returns the current state of the “active” database. The
dbStats command document describes the fields in the db.stats() output.
118. What is the
working set?
Working set represents the total body of data that the
application uses in the course of normal operation. Often this is a subset of
the total data size, but the specific size of the working set depends on actual
moment-to-moment use of the database.
If you run a query that requires MongoDB to scan every
document in a collection, the working set will expand to include every
document. Depending on physical memory size, this may cause documents in the
working set to “page out,” or to be removed from physical memory by the
operating system. The next time MongoDB needs to access these documents,
MongoDB may incur a hard page fault.
If you run a query that requires MongoDB to scan every
document in a collection, the working set includes every active document in
memory.
For best performance, the majority of your active set should
fit in RAM.
119. Why are the files
in my data directory larger than the data in my database?
The data files in your data directory, which is the /data/db
directory in default configurations, might be larger than the data set inserted
into the database. Consider the following possible causes:
• Preallocated data files.
In the data directory, MongoDB preallocates data files to a
particular size, in part to prevent file system fragmentation.
MongoDB names the first data file <databasename>.0,
the next databasename>.1, etc. The
first file mongod allocates is 64 megabytes, the next 128 megabytes, and so on,
up to 2 gigabytes, at which point all subsequent files are 2 gigabytes. The
data files include files with allocated space but that hold no data.
mongod may allocate a 1 gigabyte data file that may be 90%
empty. For most larger databases, unused allocated space is small compared to
the database.
On Unix-like systems, mongod preallocates an additional data
file and initializes the disk space to 0. Preallocating data files in the
background prevents significant delays when a new database file is next
allocated.
You can disable preallocation by setting preallocDataFiles to false. However do not
disable preallocDataFiles for production environments: only use
preallocDataFiles for testing and with small data sets where you frequently
drop databases.
On Linux systems you can use hdparm to get an idea of how
costly allocation might be:
time hdparm --fallocate $((1024*1024)) testfile
• The oplog.
If this mongod is a member of a replica set, the data
directory includes the oplog.rs file, which is a preallocated capped collection
in the local database. The default allocation is approximately 5% of disk space
on 64-bit installations. In most cases, you should not need to resize the
oplog. However, if you do, see Change the Size of the Oplog.
• The journal.
The data directory contains the journal files, which store
write operations on disk prior to MongoDB applying them to databases.
• Empty records.
MongoDB maintains lists of empty records in data files when
deleting documents and collections. MongoDB can reuse this space, but will
never return this space to the operating system.
To de-fragment allocated storage, use compact, which
de-fragments allocated space. By de-fragmenting storage, MongoDB can
effectively use the allocated space. compact requires up to 2 gigabytes of
extra disk space to run. Do not use compact if you are critically low on disk
space.
Important: compact only removes fragmentation
from MongoDB data files and does not return any disk space to the operating
system.
To reclaim deleted space, use repairDatabase, which rebuilds
the database which de-fragments the storage and may release space to the
operating system. repairDatabase requires up to 2 gigabytes of extra disk space
to run. Do not use repairDatabase if you are critically low on disk space.
Warning: repairDatabase requires enough
free disk space to hold both the old and new database files while the repair is
running. Be aware that repairDatabase will block all other operations and may
take a long time to complete.
120. How can I check
the size of a collection?
To view the size of a collection and other information, use
the db.collection.stats() method from
the mongo shell. The following example issues db.collection.stats() for the
orders collection:
db.orders.stats();
To view specific measures of size, use these methods:
• db.collection.dataSize():
data size in bytes for the collection.
• db.collection.storageSize():
allocation size in bytes, including unused space.
• db.collection.totalSize():
the data size plus the index size in bytes.
• db.collection.totalIndexSize():
the index size in bytes.
Also, the following scripts print the statistics for each
database and collection:
db._adminCommand("listDatabases").databases.forEach(function
(d) {mdb = db.getSiblingDB(d.name);
printjson(db._adminCommand("listDatabases").databases.forEach(function
(d) {mdb = db.getSiblingDB(d.name); mdb.
121. How can I check
the size of indexes?
To view the size of the data allocated for an index, use one
of the following procedures in the mongo shell:
• Use the db.collection.stats()
method using the index namespace. To retrieve a list of namespaces, issue the
following command:
db.system.namespaces.find()
• Check the value of indexSizes in the output of the
db.collection.stats() command.
Example
Issue the following command to retrieve index namespaces:
db.system.namespaces.find()
The command returns a list similar to the following:
{"name" : "test.orders"}
{"name" : "test.system.indexes"}
{"name" : "test.orders.$_id_"}
View the size of the data allocated for the orders.$_id_
index with the following sequence of operations:
use test
db.orders.$_id_.stats().indexSizes
122. How do I know
when the server runs out of disk space?
If your server runs out of disk space for data files, you
will see something like this in the log:
Thu Aug 11 13:06:09 [FileAllocator] allocating new data file
dbms/test.13, filling with zeroes...
Thu Aug 11 13:06:09 [FileAllocator] error failed to allocate
new file: dbms/test.13 size: 2146435072 Thu Aug 11 13:06:09 [FileAllocator]
will try again in 10 seconds
Thu Aug 11 13:06:19 [FileAllocator] allocating new data file
dbms/test.13, filling with zeroes...
Thu Aug 11 13:06:19 [FileAllocator] error failed to allocate
new file: dbms/test.13 size: 2146435072 Thu Aug 11 13:06:19 [FileAllocator]
will try again in 10 seconds
The server remains in this state forever, blocking all
writes including deletes. However, reads still work. To delete some data and compact,
using the compact command, you must restart the server first.
If your server runs out of disk space for journal files, the
server process will exit. By default, mongod creates journal files in a
sub-directory of dbPath named journal. You may elect to put the journal files
on another storage device using a filesystem mount or a symlink.
Note:
If you place the journal files on a separate storage device you will not be
able to use a file system snapshot tool to capture a valid snapshot of your
data files and journal files.
123. Should you run
ensureIndex() after every insert?
No. You only need to create an index once for a single
collection. After initial creation, MongoDB automatically updates the index as
data changes.
While running ensureIndex() is usually ok, if an index
doesn’t exist because of ongoing administrative work, a call to ensureIndex()
may disrupt database availability. Running ensureIndex() can render a replica
set inaccessible as the index creation is happening.
124. How do you know what indexes exist in a
collection?
To list a collection’s indexes, use the db.collection.getIndexes() method or a similar
method for your driver.
125. How do you
determine the size of an index?
To check the sizes of the indexes on a collection, use db.collection.stats().
126. What happens if
an index does not fit into RAM?
When an index is too large to fit into RAM, MongoDB must
read the index from disk, which is a much slower operation than reading from
RAM. Keep in mind an index fits into RAM when your server has RAM available for
the index combined with the rest of the working set.
In certain cases, an index does not need to fit entirely
into RAM.
127. How do you know
what index a query used?
To inspect how MongoDB processes a query, use the explain()
method in the mongo shell, or in your application driver.
128. How do you
determine what fields to index?
A number of factors determine what fields to index,
including selectivity, fitting indexes into RAM, reusing indexes in multiple
queries when possible, and creating indexes that can support all the fields in
a given query.
129. How do write
operations affect indexes?
Any write operation that alters an indexed field requires an
update to the index in addition to the document itself. If you update a
document that causes the document to grow beyond the allotted record size, then
MongoDB must update all indexes that include this document as part of the
update operation. Therefore, if your application is write-heavy, creating too
many indexes might affect performance.
130. Will building a
large index affect database performance?
Building an index can be an IO-intensive operation,
especially if you have a large collection. This is true on any database system
that supports secondary indexes, including MySQL. If you need to build an index
on a large collection, consider building the index in the background.
If you build a large index without the background option,
and if doing so causes the database to stop responding, do one of the
following:
• Wait for the index to finish building.
• Kill the current operation (see db.killOp()). The partial index will be deleted.
No comments:
Post a Comment