Tuesday, 21 June 2022

Finding and killing long running SQL queries in AWS RDS

Occasionally, we find sites that are running extremely slowly, and digging down we end up finding MySQL queries that have been running for sometimes days, or even weeks.

These stuck queries end up consuming resources

How to find the stuck queries

The easiest way to find stuck queries is to send the following query to your database

SHOW FULL PROCESSLIST;

This will list out all the queries, what user executed them and when they were first issued.

How to manually kill the stuck queries in AWS RDS MySQL

If you ever find threads stuck in state sending data that will never complete, you will need to kill those queries. Note, this can cause you to corrupt a database if there are ongoing non-SELECT based queries (so execute with caution). In this example the MySQL username is drupal.

SELECT CONCAT('CALL mysql.rds_kill(',id,');') FROM information_schema.processlist

WHERE user='drupal';

This will produce a list of SQL statements to execute to kill all queries by this user.

CALL mysql.rds_kill(18047573);

CALL mysql.rds_kill(18003186);

CALL mysql.rds_kill(18122490);

How to automatically kill long running queries in MySQL

It is all well and good to reactively kill stuck queries, but what if you could automatically kill them? This sounds far more ideal.

Looking at the parameters you can tune in MySQL, the one I will focus on is called max_execution_time.

The MAX_EXECUTION_TIME hint is permitted only for SELECT statements. It places a limit N (a timeout value in milliseconds) on how long a statement is permitted to execute before the server terminates it:

Checking that AWS RDS supports this parameter (using awscli):

aws rds describe-db-parameters --db-parameter-group-name $PARAMETER_GROUP_NAME | jq '.Parameters[] | select(.ParameterName == "max_execution_time")'

So AWS RDS does support this parameter, and it's ApplyType is set to dynamic. This which means any changes are applied immediately (whereas static parameters require a database reboot to take effect).

Changing this parameter can be done via awscli or the UI.

Some Useful Links Related To Stock Market & Investment

15 Best Youtube Channels to Learn Indian Stock Market

https://tradebrains.in/youtube-channels-indian-stock-market/

How To Make A Postgres Database Read Only

In postgres you can make a particular database or all databases of a postgres cluster.

default_transaction_read_only parameter controls whether database is in read only or read write .

default_transaction_read_only -> on ( for read only )
default_transaction_read_only -> off ( for read write )

For making single db read only

–connect to database other than the readonly DB ( here i connected to default postgres db)

$ psql -d postgres

psql (12.4.5)

Type "help" for help.

postgres=# \conninfo

You are connected to database "postgres" as user "enterprisedb" via socket in "/tmp" at port "5444".

postgres=# alter database edbstore set default_transaction_read_only=on;

ALTER DATABASE

— Now restart the postgres cluster(either using pg_ctl or service_name)

[root@localhost ~]# systemctl stop edb-as-12

[root@localhost ~]# systemctl start edb-as-12

— Now login to database and check:

edbstore=# \conninfo

You are connected to database "edbstore" as user "enterprisedb" via socket in "/tmp" at port "5444".

edbstore=# create table test as select * from pg_settings;

ERROR: cannot execute CREATE TABLE AS in a read-only transaction

We can see it is not allowing write operations.

Now if you wish to remove the database from read only mode, then

postgres=# \conninfo

You are connected to database "postgres" as user "enterprisedb" via socket in "/tmp" at port "5444".

postgres=# alter database edbstore set default_transaction_read_only=off;

ALTER DATABASE

[root@localhost ~]# systemctl stop edb-as-12

[root@localhost ~]# systemctl start edb-as-12

For making all the databases of the postgres cluster read only:

postgres=# show default_transaction_read_only;

default_transaction_read_only

-------------------------------

off

(1 row)

postgres=# alter system set default_transaction_read_only=on;

ALTER SYSTEM

-- Restart the pg cluster:

[root@localhost ~]# systemctl stop edb-as-12

[root@localhost ~]# systemctl start edb-as-12

Wednesday, 28 October 2020

Different Technology Interview Q & A Video Links

Linux / Shell Script

https://www.youtube.com/watch?v=6ue2luv2I-Y

https://www.youtube.com/watch?v=l0QGLMwR-lY

https://www.youtube.com/watch?v=hOACNkPpIMI

https://www.youtube.com/watch?v=sdQufg-n4zE

https://www.youtube.com/watch?v=tOcfeQrJL_E

https://www.youtube.com/watch?v=GzIFoJBVwh8

https://www.youtube.com/watch?v=IxApf1YtkJU

https://www.youtube.com/watch?v=WDjbMucvEmk

https://www.youtube.com/watch?v=E5ylHeENN3o

General

https://www.youtube.com/watch?v=mSO37hFadvE

https://www.youtube.com/watch?v=cODCpXtPHbQ

https://www.youtube.com/watch?v=v5e_PasMdXc

https://www.youtube.com/watch?v=-WEpWH1NHGU

https://www.youtube.com/watch?v=KWOSGVtHWqA

Oracle

https://www.youtube.com/watch?v=o6kwgGBr_BE&list=PL3fS-Jh9cS0wIL2B-mhAttFJ8u0ccwMc8&index=3&pbjreload=101

https://www.youtube.com/watch?v=s8m_R9_NtTI

https://www.youtube.com/playlist?list=PLb1qVSx1k1Vr0v4wVyvT3GEuA0J0M4xBm

https://www.youtube.com/c/OracleDatabaseProductManagement/playlists

Postgresql

https://www.youtube.com/playlist?list=PLmhoQ1nd8VnRD4XcLsBooxIvnPpEs67im

https://www.youtube.com/watch?v=ABwD8IYByfk

Cassandra

https://www.youtube.com/watch?v=ZkwkgurQyHg

https://www.youtube.com/watch?v=Mvsy2DZhKSw

https://www.youtube.com/watch?v=ejnS81dABdM&t=670s

https://www.youtube.com/watch?v=PTOxb3a8g1k

MySql

https://www.youtube.com/watch?v=Ly3btyYo5UY&pbjreload=101

https://www.youtube.com/watch?v=sjqyH-o8HlE&list=PLpyc33gOcbVD32OoxHXVzkPP9921roqoV&index=2

https://www.youtube.com/playlist?list=PL03-vZ6WiQ5vaCNgsPopzH5yO4xBQcFX9

GIT

https://www.youtube.com/watch?v=8Xo3l1zv41I

Networking

https://www.youtube.com/watch?v=Eg6ibIGxeGc

Thursday, 24 September 2020

Cloud Computing Related Important Links

AWS

https://aws.amazon.com/professional-services/CAF/

https://aws.amazon.com/security/

https://aws.amazon.com/architecture/well-architected/?wa-lens-whitepapers.sort-by=item.additionalFields.sortDate&wa-lens-whitepapers.sort-order=desc

https://docs.aws.amazon.com/prescriptive-guidance/latest/migration-portfolio-discovery/planning.html

https://aws.amazon.com/solutions/implementations/aws-landing-zone/

https://www.cloudtp.com/doppler/sequencing-and-wave-planning-for-large-scale-application-migration/

https://medium.com/@rebecca_huber/cloud-migration-and-transformation-wave-planning-a1f0e6ff3b3f

https://aws.amazon.com/blogs/database/how-to-solve-some-common-challenges-faced-while-migrating-from-oracle-to-postgresql/

https://d1.awsstatic.com/whitepapers/Migration/oracle-database-amazon-aurora-postgresql-migration-playbook.pdf

https://aws.amazon.com/blogs/database/database-migration-what-do-you-need-to-know-before-you-start/

https://aws.amazon.com/blogs/database/how-to-migrate-your-oracle-database-to-postgresql/

Thursday, 10 September 2020

Important Technology Links of different Topics

Cassandra Topics

https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/slowReads.html

https://blog.pythian.com/cassandra-information-using-nodetool/

https://blog.knoldus.com/learn-cassandra-nodetool-monitormanage/

https://www.vskills.in/certification/tutorial/nodetool/

https://blog.knoldus.com/?s=cassandra

Oracle DB

https://www.hhutzler.de/blog/using-datapatch-rac-env/#

http://oracle-help.com/oracle-rac/

http://rob.lasonder.org/index.php?title=Main_Page

http://www.br8dba.com/

https://oracledbwr.com/category/patching/rac-database/

https://ittutorial.org/exadata-tutorials-oracle-magic-database-machine-exadata-lessons/

https://ittutorial.org/performance-tuning-and-sql-tuning-tutorial-in-the-oracle-database/

https://www.ktexperts.com/daily-tips-for-oracle-dba/

MongoDB

http://tommongo.blogspot.com/

http://tommongo.blogspot.com/2019/01/m103-basic-cluster-administration-mongo.html

https://github.com/AlessandroCorradini/MongoDB-University

Oracle Cloud

https://docs.oracle.com/en/solutions/design-dr/plan-dr-databases1.html#GUID-CCF50BB4-DD5B-4DE9-AD05-4FC475491AF0

AZURE

https://blog.dbi-services.com/using-the-managed-postgresql-service-in-azure/

https://docs.microsoft.com/en-us/learn/certifications/azure-database-administrator-associate?tab=tab-learning-paths

https://docs.microsoft.com/en-us/learn/paths/azure-sql-fundamentals/

MySQL

http://mysqlentomologist.blogspot.com/

https://www.dbrnd.com/2016/02/mysql-opening-and-closing-of-database-connection-is-very-costly/

https://mysqlserverteam.com/category/replication/page/2/

AWS Topics

https://www.percona.com/blog/2018/08/02/amazon-rds-multi-az-deployments-read-replicas/

https://www.cloudtp.com/doppler/designing-database-high-availability-replication-aws/

Postgresql DB

https://severalnines.com/database-blog/understanding-and-reading-postgresql-system-catalog

https://www.enterprisedb.com/blog/postgresql-vs-mysql-360-degree-comparison-syntax-performance-scalability-and-features

https://www3.dbmaestro.com/blog/the-ultimate-oracle-to-postgresql-migration-checklist

https://www.tutorialdba.com/p/postgresql-system-catalog-views.html

https://www.enterprisedb.com/postgres-tutorials/postgresql-replication-and-automatic-failover-tutorial

https://www.digitalocean.com/docs/databases/postgresql/resources/supported-extensions/

https://blog.feedspot.com/postgresql_blogs/

https://thoughtbot.com/blog/reading-an-explain-analyze-query-plan

https://thoughtbot.com/blog/tags/postgres

https://info.crunchydata.com/blog/wheres-my-replica-troubleshooting-streaming-replication-synchronization-in-postgresql

https://valehagayev.wordpress.com/2019/09/30/migrating-from-oracle-to-postgresql-with-ora2pg/

https://www.itopstimes.com/itsec/creating-a-multi-layered-security-architecture-for-your-databases/

https://www.percona.com/blog/2019/05/24/an-overview-of-sharding-in-postgresql-and-how-it-relates-to-mongodbs/#:~:text=In%20fact%2C%20PostgreSQL%20has%20implemented,PostgreSQL%20for%20a%20long%20time.

https://medium.com/leboncoin-engineering-blog/managing-postgresql-backup-and-replication-for-very-large-databases-61fb36e815a0

https://www.enterprisedb.com/blog/the-complete-oracle-to-postgresql-migration-guide-tutorial-move-convert-database-oracle-alternative

https://www.enterprisedb.com/blog/comparing-oracle-edb-postgres

https://www.enterprisedb.com/blog/postgres-table-partitioning

https://www.enterprisedb.com/blog/8-cool-interesting-facts-things-postgresql-can-do

https://www.enterprisedb.com/blog/8-ways-rdbms-postgres-handles-non-relational-data

https://www.enterprisedb.com/blog/why-mastercards-secret-zero-downtime-postgresql

https://www.enterprisedb.com/blog/how-to-secure-postgresql-security-hardening-best-practices-checklist-tips-encryption-authentication-vulnerabilities

https://www.enterprisedb.com/blog/postgresql-query-optimization-performance-tuning-with-explain-analyze

https://www.enterprisedb.com/blog/microsoft-sql-server-mssql-vs-postgresql-comparison-details-what-differences

https://www.enterprisedb.com/blog/postgresql-database-patching-ha-cluster-postgres-update-minor-versions

https://www.enterprisedb.com/blog/how-to-upgrade-postgresql-and-upgrading-methods-costs-risks

https://www.enterprisedb.com/blog/monitoring-postgresql-database-system-activities-performance-system-stats-extension

https://www.enterprisedb.com/blog/postgresql-wal-write-ahead-logging-management-strategy-tradeoffs

https://www.enterprisedb.com/blog/maintaining-postgresql-for-high-performance-what-is-wrong-or-right-what-consider

https://www.enterprisedb.com/blog/postgres-pulse-insights-its-still-slow-solving-bloated-postgres-database

https://www.enterprisedb.com/blog/what-does-database-high-availability-really-mean

https://www.enterprisedb.com/blog/postgresql-database-backup-recovery-what-works-wal-pitr

https://www.enterprisedb.com/blog/how-to-fix-postgresql-index-corruption-deal-repair-rebuild

https://www.enterprisedb.com/blog/top-tools-manage-postgres-enterprise-administration-performance-high-availability-and

https://www.enterprisedb.com/blog/postgresql-pgpool-connection-pool-database-load

https://www.enterprisedb.com/blog/high-availibility-parameters-configuration-streaming-replication-postgresql

https://software.fujitsu.com/jp/manual/manualfiles/m150000/j2ul1738/08enz200/j1738-00-12-01-01.html

https://www.enterprisedb.com/blog/tips-postgres-postgres-insider

https://www.percona.com/blog/2019/02/04/pg_repack-rebuild-postgresql-database-objects-online/

https://luppeng.wordpress.com/2019/11/30/handling-sensitive-data-in-postgresql-row-level-security-column-access-privileges-and-data-masking/

https://www.enterprisedb.com/blog/postgres-and-transparent-data-encryption-tde

https://www.enterprisedb.com/blog/introducing-postgres-pulse-insights-live-qa-postgres-experts

https://www.dbrnd.com/2019/06/postgresql-disable-non-durable-parameters-and-improve-server-performance/

https://www.dbrnd.com/2019/05/postgresql-how-to-remove-accents-diacritic-signs-from-lexemes/

https://www.dbrnd.com/2019/02/postgresql-dont-take-backup-until-your-database-is-changed-check-pg_stat_database-tuple/

https://www.dbrnd.com/2018/04/postgresql-9-5-brin-index-maintenance-using-brin_summarize_new_values-add-new-data-page-in-brin-index/

https://www.dbrnd.com/2018/04/postgresql-use-pgbench-for-testing-the-load-performance-of-server/

https://www.dbrnd.com/2018/03/postgresql-do-before-populating-a-new-database/

https://www.dbrnd.com/2017/12/postgresql-10-important-name-change-from-xlog-to-wal-and-location-to-lsn/

https://www.dbrnd.com/2017/12/postgresql-important-parameters-for-query-planner-configuration-enable_seqscan/

https://www.dbrnd.com/2017/07/postgresql-fatal-error-remaining-connection-slots-are-reserved-for-non-replication-superuser-connections/

https://www.dbrnd.com/2017/11/postgresql-move-table-with-data-from-one-schema-to-another-schema/

https://bucardo.org/

Standby_Lag Calculation --------------- https://www.postgresql.org/message-id/CADKbJJWz9M0swPT3oqe8f9+tfD4-F54uE6Xtkh4nERpVsQnjnw@mail.gmail.com

DB Migration

https://tapoueh.org/blog/2017/07/from-mysql-to-postgresql/

https://www.infinitypp.com/amazon-aws/database-migration-service-mysql-dms/

https://dzone.com/articles/how-to-use-amazon-dms-to-perform-a-database-migrat

https://tutorialsdojo.com/aws-database-migration-service/

https://valehagayev.wordpress.com/2019/09/30/migrating-from-oracle-to-postgresql-with-ora2pg/

https://info.crunchydata.com/blog/setup-ora2pg-for-oracle-to-postgres-migration

https://severalnines.com/database-blog/migrating-oracle-postgresql-what-you-should-know

https://hevodata.com/learn/postgresql-logical-replication/

https://www.enterprisedb.com/blog/the-complete-oracle-to-postgresql-migration-guide-tutorial-move-convert-database-oracle-alternative?gclid=CjwKCAjwzIH7BRAbEiwAoDxxTiTWBr-5DhbQWTNKt5qoL61VKHpUX9cIs1F_3FA_Uv6lyPxJkSEvahoCbEYQAvD_BwE&utm_campaign=India&utm_medium=cpc&utm_source=google

MySql

https://www.percona.com/blog/2020/09/30/mysql-101-tuning-mysql-after-upgrading-memory/

https://www.percona.com/blog/2020/08/27/mysql-8-0-19-innodb-replicaset-configuration-and-manual-switchover/

https://www.percona.com/blog/2020/08/26/creating-an-external-replica-of-aws-aurora-mysql-with-mydumper/

https://www.dbrnd.com/mysql-dba-scripts/

http://howtolamp.com/lamp/mysql/5.6/log-files/

https://blog.jcole.us/

https://www.dbrnd.com/2016/01/mysql-check-optimize-and-repair-tables-using-mysqlcheck/

https://deepakmysqldba.wordpress.com/page/6/

https://www.dbrnd.com/2016/01/mysql-configured-innodb-engine-for-multithreaded-database/

https://www.infoworld.com/article/3210905/10-essential-performance-tips-for-mysql.html

https://www.infoworld.com/cms/article/http:/www.infoworld.com/article/3209665/application-development/21-rules-for-faster-sql-queries.html#tk.ifw-infsb

https://blog.pythian.com/comparing-pgpool-ii-and-pgbouncer/

https://www.dbrnd.com/2016/10/mysql-controlling-query-optimizer-to-choose-the-best-execution-plan-optimizer_prune_level-optimizer_search_depth/

https://www.dbrnd.com/2016/10/mysql-error-2006-hy000-mysql-server-has-gone-away/

https://www.percona.com/blog/2018/10/01/how-to-fix-mysql-replication-after-an-incompatible-ddl-command/

https://www.percona.com/blog/2013/05/09/how-to-create-a-new-or-repair-a-broken-gtid-based-slave-with-percona-xtrabackup/

https://www.percona.com/blog/2014/05/02/how-to-identify-and-cure-mysql-replication-slave-lag/

https://www.percona.com/blog/2013/02/08/how-to-createrestore-a-slave-using-gtid-replication-in-mysql-5-6/

https://www.percona.com/blog/2014/01/28/10-mysql-performance-tuning-settings-after-installation/

Sunday, 6 September 2020

Interview Q and A for Cassandra DB Part - 4

Why can’t I set listen_address to listen on 0.0.0.0 (all my addresses)?

Cassandra is a gossip-based distributed system and listen_address is the address a node tells other nodes to reach it at. Telling other nodes “contact me on any of my addresses” is a bad idea; if different nodes in the cluster pick different addresses for you, Bad Things happen.

If you don’t want to manually specify an IP to listen_address for each node in your cluster (understandable!), leave it blank and Cassandra will use InetAddress.getLocalHost() to pick an address. Then it’s up to you or your ops team to make things resolve correctly (/etc/hosts/, dns, etc).

One exception to this process is JMX, which by default binds to 0.0.0.0 (Java bug 6425769).

I delete data from Cassandra, but disk usage stays the same. What gives?

Data you write to Cassandra gets persisted to SSTables. Since SSTables are immutable, the data can’t actually be removed when you perform a delete, instead, a marker (also called a “tombstone”) is written to indicate the value’s new status. Never fear though, on the first compaction that occurs between the data and the tombstone, the data will be expunged completely and the corresponding disk space recovered. See Compaction for more detail.

Why does nodetool ring only show one entry, even though my nodes logged that they see each other joining the ring?

This happens when you have the same token assigned to each node. Don’t do that.

Most often this bites people who deploy by installing Cassandra on a VM (especially when using the Debian package, which auto-starts Cassandra after installation, thus generating and saving a token), then cloning that VM to other nodes.

The easiest fix is to wipe the data and commitlog directories, thus making sure that each node will generate a random token on the next restart.

Can I change the replication factor (a a keyspace) on a live cluster?

Yes, but it will require running a full repair (or cleanup) to change the replica count of existing data:

Alter the replication factor for desired keyspace (using cqlsh for instance).
If you’re reducing the replication factor, run nodetool cleanup on the cluster to remove surplus replicated data. Cleanup runs on a per-node basis.
If you’re increasing the replication factor, run nodetool repair -full to ensure data is replicated according to the new configuration. Repair runs on a per-replica set basis. This is an intensive process that may result in adverse cluster performance. It’s highly recommended to do rolling repairs, as an attempt to repair the entire cluster at once will most likely swamp it. Note that you will need to run a full repair (-full) to make sure that already repaired sstables are not skipped.

Can I Store (large) BLOBs in Cassandra?

Cassandra isn’t optimized for large file or BLOB storage and a single blob value is always read and send to the client entirely. As such, storing small blobs (less than single digit MB) should not be a problem, but it is advised to manually split large blobs into smaller chunks.

Please note in particular that by default, any value greater than 16MB will be rejected by Cassandra due the max_mutation_size_in_kb configuration of the Cassandra Configuration File file (which default to half of commitlog_segment_size_in_mb, which itself default to 32MB).

Nodetool says “Connection refused to host: 127.0.1.1” for any remote host. What gives?

Nodetool relies on JMX, which in turn relies on RMI, which in turn sets up its own listeners and connectors as needed on each end of the exchange. Normally all of this happens behind the scenes transparently, but incorrect name resolution for either the host connecting, or the one being connected to, can result in crossed wires and confusing exceptions.

If you are not using DNS, then make sure that your /etc/hosts files are accurate on both ends. If that fails, try setting the -Djava.rmi.server.hostname=<public name> JVM option near the bottom of cassandra-env.sh to an interface that you can reach from the remote machine.

Will batching my operations speed up my bulk load?

No. Using batches to load data will generally just add “spikes” of latency. Use asynchronous INSERTs instead, or use true Bulk Loading.

An exception is batching updates to a single partition, which can be a Good Thing (as long as the size of a single batch stay reasonable). But never ever blindly batch everything!

On RHEL nodes are unable to join the ring

Check if SELinux is on; if it is, turn it off.

Why does top report that Cassandra is using a lot more memory than the Java heap max?

Cassandra uses Memory Mapped Files (mmap) internally. That is, we use the operating system’s virtual memory system to map a number of on-disk files into the Cassandra process’ address space. This will “use” virtual memory; i.e. address space, and will be reported by tools like top accordingly, but on 64 bit systems virtual address space is effectively unlimited so you should not worry about that.

What matters from the perspective of “memory use” in the sense as it is normally meant, is the amount of data allocated on brk() or mmap’d /dev/zero, which represent real memory used. The key issue is that for a mmap’d file, there is never a need to retain the data resident in physical memory. Thus, whatever you do keep resident in physical memory is essentially just there as a cache, in the same way as normal I/O will cause the kernel page cache to retain data that you read/write.

The difference between normal I/O and mmap() is that in the mmap() case the memory is actually mapped to the process, thus affecting the virtual size as reported by top. The main argument for using mmap() instead of standard I/O is the fact that reading entails just touching memory - in the case of the memory being resident, you just read it - you don’t even take a page fault (so no overhead in entering the kernel and doing a semi-context switch). This is covered in more detail here.

What are seeds?

Seeds are used during startup to discover the cluster.

If you configure your nodes to refer some node as seed, nodes in your ring tend to send Gossip message to seeds more often (also see the section on gossip) than to non-seeds. In other words, seeds are worked as hubs of Gossip network. With seeds, each node can detect status changes of other nodes quickly.

Seeds are also referred by new nodes on bootstrap to learn other nodes in ring. When you add a new node to ring, you need to specify at least one live seed to contact. Once a node join the ring, it learns about the other nodes, so it doesn’t need seed on subsequent boot.

You can make a seed a node at any time. There is nothing special about seed nodes. If you list the node in seed list it is a seed

Seeds do not auto bootstrap (i.e. if a node has itself in its seed list it will not automatically transfer data to itself) If you want a node to do that, bootstrap it first and then add it to seeds later. If you have no data (new install) you do not have to worry about bootstrap at all.

Recommended usage of seeds:

pick two (or more) nodes per data center as seed nodes.
sync the seed list to all your nodes

Does single seed mean single point of failure?

The ring can operate or boot without a seed; however, you will not be able to add new nodes to the cluster. It is recommended to configure multiple seeds in production system.

Why can’t I call jmx method X on jconsole?

Some of JMX operations use array argument and as jconsole doesn’t support array argument, those operations can’t be called with jconsole (the buttons are inactive for them). You need to write a JMX client to call such operations or need array-capable JMX monitoring tool.

Why do I see “… messages dropped …” in the logs?

This is a symptom of load shedding – Cassandra defending itself against more requests than it can handle.

Internode messages which are received by a node, but do not get not to be processed within their proper timeout (see read_request_timeout, write_request_timeout, … in the Cassandra Configuration File), are dropped rather than processed (since the as the coordinator node will no longer be waiting for a response).

For writes, this means that the mutation was not applied to all replicas it was sent to. The inconsistency will be repaired by read repair, hints or a manual repair. The write operation may also have timeouted as a result.

For reads, this means a read request may not have completed.

Load shedding is part of the Cassandra architecture, if this is a persistent issue it is generally a sign of an overloaded node or cluster.

Cassandra dies with java.lang.OutOfMemoryError: Map failed

If Cassandra is dying specifically with the “Map failed” message, it means the OS is denying java the ability to lock more memory. In linux, this typically means memlock is limited. Check /proc/<pid of cassandra>/limits to verify this and raise it (eg, via ulimit in bash). You may also need to increase vm.max_map_count. Note that the debian package handles this for you automatically.

What happens if two updates are made with the same timestamp?

Updates must be commutative, since they may arrive in different orders on different replicas. As long as Cassandra has a deterministic way to pick the winner (in a timestamp tie), the one selected is as valid as any other, and the specifics should be treated as an implementation detail. That said, in the case of a timestamp tie, Cassandra follows two rules: first, deletes take precedence over inserts/updates. Second, if there are two updates, the one with the lexically larger value is selected.

Why bootstrapping a new node fails with a “Stream failed” error?

Two main possibilities:

the GC may be creating long pauses disrupting the streaming process
compactions happening in the background hold streaming long enough that the TCP connection fails

In the first case, regular GC tuning advices apply. In the second case, you need to set TCP keepalive to a lower value (default is very high on Linux). Try to just run the following:

$ sudo /sbin/sysctl -w net.ipv4.tcp_keepalive_time=60 net.ipv4.tcp_keepalive_intvl=60 net.ipv4.tcp_keepalive_probes=5

To make those settings permanent, add them to your /etc/sysctl.conf file.

Note: GCE’s firewall will always interrupt TCP connections that are inactive for more than 10 min. Running the above command is highly recommended in that environment.

Q18) Explain what is logging in Cassandra.

Answer:
Each and every log list is stored in logback.xml and you can change any property indiscriminately adopting this file. INFO level logs are written to system.log debug.log include debugging level entries.

Q19) Is HDFS utilized in Cassandra? If yes, where?

Answer:
The adaptable medium of Cassandra file system is termed as HDFS. It enables to operate on huge data with comfort. It has successfully replaced oriental methods of dealing with large quantum of nodal structures.

Q20) Explain the fundamental difference between Cassandra and Hadoop.

Answer:
Hadoop
It is an analytical dashboard that helps manage a large database.
Cassandra
Cassandra is purely NoSQL Database which enables high-speed performance together with executing online trAnswer:action volumes.

Q21) Explain the rudimentary difference between Cassandra and HBase.

Answer:
HBase
It is a multiple discharged database developed on Java and by nature an open source and NoSQL centric. It is used to exemplify online applications.
Cassandra
It’s an operative structure almost resembles HBase but the functionality is better served and attached to it are many advanced features as it was created post observing the fallacies of HBase. So, many technical loopholes have been eliminated, that were a part of HBase.

Q22) Elaborate on Identifiers.

Answer:
These are tokens, using usual expressions. These tokens directly relate to recognizing specific columns, tables and objects.

7) What are the main components of Cassandra?

The components of Cassandra include:

· Node

· Data cluster

· Commit log

· Cluster

· Meme-table

· SSTable

· Bloom filter

8) What are the functions of Cassandra?

Ans: This database supports two main categories of functions:

Scalar functions: Its primary purpose is taking some groups of values and producing an output with it.

Aggregate functions: Its primary function is producing a combined result using selected multiple rows.

15) How is data distribution done?

Ans: Cassandra database is a highly-available database, and it stores data by evenly dividing the data around its nodes. For this, it uses the Murmur3 partitioning function to distribute given data in nodes evenly.

16) How does Cassandra store data?

Ans: The data storage path in Cassandra begins with the memtable where the data is stored temporarily and is also called a commit log. And once committed, the data is periodically flushed and written into SSTable

18) What is a direct request?

Ans: Direct request in Cassandra is a part of the read operation. In this, the coordinator node contacts the replica node.

19) Define digest request?

Ans: When the coordinator node contacts replicas, it actually requests those nodes which reply fastest. Then these contacted nodes respond with a digest of the data required.

20) Explain read repair request?

Ans: When the coordinator node sends requests, it checks in the nodes for any outdated data. This data is sent for a background read and repair and is replaced with the updated data. Read and repair requests, is a method to keep the data updated, and it also makes sure that the requested row is consistent on all replicas.

21) What is a write operation?

Ans: There is step by step operations in writing, which goes as follows.

Step1: It is as soon as it receives its request it sends the data to the commit log to save the data.

Step2: Data is inserted upon request and then sent to commit log to save data.

Step3: If the memtable reaches its limit then data is flushed to SSTable.

23) What do you mean by ACID?

Ans: ACID stands for

Atomicity: Which means either your transaction can fail or commit

Consistency: Its definition changes from software to software or an application to application, but its general meaning is that data has to stay consistent.

Isolation: Data has to be isolated and separated from each other

Durability: It assures you that once the database receives data, it should ensure that the data is processed. So it is an advantage if the database fails, then the data will not be lost.

24) What is BASE?

Ans: Not every application or software needs this strong consistency, so this is where base comes into action. The BASE stands for Basically Available Soft-state Eventually-consistent properties.NoSQL databases basically use these models.

27) What are the best monitor tools of Cassandra?

Ans: Although Cassandra comes with built-in tolerance features, it still needs to be monitored for effective results. Here are some tools which Cassandra uses to monitor its databases:

· Solarwind server and application monitor

· Instana

· Instaclustr

· AppDynamics

· Dynatrace

· Machine engine applications manager.

32) Name the key roles of CQL?

It is very necessary to provide different types of users with different kinds of roles depending upon their requirements. It ensures the security of database users. and their key roles goes as follows:

· Create a role

· Alter role

· Drop role

· Grant role

· Revoke role

· List role

70.Do I need to use a caching layer (like memcached) with Cassandra?

Cassandra negates the need for extra software caching layers like memcached through its distributed architecture, fast write throughput capabilities, and internal memory caching structures.

9). List out some key features of Apache Cassandra?
Ans: It is scalable, fault-tolerant, and consistent.
It is a column-oriented database.
Its distribution design is based on Amazon’s Dynamo and its data model on Google’s Bigtable.
Created at Facebook, it differs sharply from relational database management systems.
Cassandra implements a Dynamo-style replication model with no single point of failure, but adds a more powerful “column family” data model.
Cassandra is being used by some of the biggest companies such as Facebook, Twitter, Cisco, Rackspace, ebay, Twitter, Netflix, and more.

Links: -

http://www.interviewquestionspdf.com/2015/11/top-50-cassandra-interview-questions.html

https://cassandra.apache.org/doc/latest/faq/

https://mindmajix.com/cassandra-interview-questions-and-answers

https://www.gangboard.com/blog/cassandra-interview-questions-and-answers

https://www.wisdomjobs.com/e-university/apache-cassandra-interview-questions.html

http://geoinsyssoft.com/cassandra-interview-questions/

http://crackyourinterview.com/Ads-Cassandra-Interview-Questions-Part1.aspx

http://pkkjava.blogspot.com/p/cassandra-interview-questions.html

About Me

Tuesday, 21 June 2022

How to find the stuck queries

How to manually kill the stuck queries in AWS RDS MySQL

How to automatically kill long running queries in MySQL

15 Best Youtube Channels to Learn Indian Stock Market

For making single db read only

For making all the databases of the postgres cluster read only:

Wednesday, 28 October 2020

Thursday, 24 September 2020

Thursday, 10 September 2020

Sunday, 6 September 2020

Why can’t I set listen_address to listen on 0.0.0.0 (all my addresses)?

I delete data from Cassandra, but disk usage stays the same. What gives?

Why does nodetool ring only show one entry, even though my nodes logged that they see each other joining the ring?

Can I change the replication factor (a a keyspace) on a live cluster?

Can I Store (large) BLOBs in Cassandra?

Nodetool says “Connection refused to host: 127.0.1.1” for any remote host. What gives?

Will batching my operations speed up my bulk load?

On RHEL nodes are unable to join the ring

Why does top report that Cassandra is using a lot more memory than the Java heap max?

What are seeds?

Does single seed mean single point of failure?

Why can’t I call jmx method X on jconsole?

Why do I see “… messages dropped …” in the logs?

Cassandra dies with java.lang.OutOfMemoryError: Map failed

What happens if two updates are made with the same timestamp?

Why bootstrapping a new node fails with a “Stream failed” error?

Q18) Explain what is logging in Cassandra.

Q19) Is HDFS utilized in Cassandra? If yes, where?

Q20) Explain the fundamental difference between Cassandra and Hadoop.

Q21) Explain the rudimentary difference between Cassandra and HBase.

Q22) Elaborate on Identifiers.

7) What are the main components of Cassandra?

8) What are the functions of Cassandra?

15) How is data distribution done?

16) How does Cassandra store data?

18) What is a direct request?

19) Define digest request?

20) Explain read repair request?

21) What is a write operation?

23) What do you mean by ACID?

24) What is BASE?

27) What are the best monitor tools of Cassandra?

32) Name the key roles of CQL?