• Speaker: Mikael Ronstrom, PhD, the creator of the Cluster engine
  • Explains the cluster structure
  • Aspects of performance
    • Response times
    • Throughput
    • Low variation of response times
  • Improving performance
    • use low level API (NDB API), expensive, hard
    • use new features in MySQL Cluster Carrier Grade Edition 6.3 (currently 6.3.13), more on this later
    • proper partitioning of tables, minimize communication
    • use of hardware
  • NDB API is a C++ record access API
    • supports sending parallel record operations within the same transaction or in different transactions
    • asynchronous and synchronous
    • NDB kernel is programmed entirely asynchronously
  • Looking at performance
    • Fire synchronous insert transactions – 10x TCP/IP time cost
    • Five inserts in one synchronous transaction – 2x TCP/IP time cost
    • Five asynchronous insert transactions – 2x TCP/IP time cost
  • Case study
    • develop prototype using MySQL C API – performance X, response time Y
    • develop same functionality using synchronous NDB API – performance 3X, response time ~0.5Y
    • develop same functionality using asynchronous NDB API – performance 6X, response time ~0.25Y
  • Conclusion on when to use NDB API
    • performance is critical, need speed, response time, etc
    • queries are not very complex
  • Conclusion on when not to use NDB API
    • when design time is critical
    • when complex queries are executed, the MySQL optimizer may handle them better
  • New features of MySQL Cluster Carrier Grade Edition 6.3.13
    • polling based communication
      • CPU used heavily even at lower throughput
      • avoids interrupt and wake-up delays for new messages
      • some good results in benchmarks
      • decreases performance when CPU is the limiting factor
      • 10% performance improvement on 2, 4, and 8 data node clusters
      • 20% improvement if using Dolphin Express
    • epoll replacing select system calls (Linux)
      • improved performance 20% on a 32-node cluster
    • send buffer gathering
    • real-time scheduler for threads
    • lock threads to CPU
    • distribution awareness
      • 100-200% improvement when application is distribution aware
    • avoid read before Update/Delete with PK
      • UPDATE t SET a=const1 WHERE pk=x;
      • no need to do a read before UPDATE, all data is already known
      • ~10% improvement
  • old 'truths' revisited
    • previous recommendation was to run 1 data node per computer
    • this was due to bugs, which are now fixed
  • partitioning tricks
    • if there is a table that has a lot of index scans (not primary key) on it, partitioning this table to only be in one node group can be a good idea
    • partition syntax for this: PARTITION BY KEY (id) (PARTITION p0 NODEGROUP 0);
  • new performance features in MySQL Cluster 5.0
    • lock memory in main memory – ensure no swapping occurs in NDB kernel
    • batching IN (…) with primary keys
      • 100x SELECT FROM t WHERE pk=x;
      • SELECT * FROM t WHERE pk IN (x1, …, x100)
      • IN-statement is around 10x faster
    • use of multi-INSERT
      • similar 10x speedup
  • new features in MySQL Cluster CGE version 6.4 (beta, only available in bitkeeper for now)
    • multi-threaded data nodes – currently no benefit using DBT2 but 40% increase in throughput for some NDB API benchmarks
    • DBT2 improvements to follow later
  • use of hardware, CPU choice
    • Pentium D @ 2.8Ghz -> Core 2 Duo @ 2.8Ghz => 75% improvement
    • doubling L2 cache doubles thread scalability
    • choice of Dolphin Express interconnect increases throughput 10-400%
  • scalability of DBT2 threads
    • 1-2-4 threads – linear
    • 4-8 threads – 40-70%
    • 8-16 threads – 10-30%
    • decreasing scalability over 16 threads
  • current recommendation by Mikael himself: use twice as many SQL nodes as data nodes
  • future software performance improvements
    • batched key access – 0-400% performance improvement
    • improvement scan protocol – ~15% improvement
    • incremental backups
    • optimized backup code
    • parallel I/O on index scans using disk data
  • Niagara-II benchmark from 2002
      • simple read, simple update, both transactional
      • 72-CPU Sunfire 15k, 256GB RAM
      • CPUs: ultra sparc-III @ 900Mhz
      • 32-node NDB Cluster, 1 data node locked to 1 CPU
      • db size 88GB, 900 mil records
      • simple reads 1.5mil reads per second
      • simple update 340,000 per second
  • Everyone is overwhelmed, so no questions are asked
● ● ●
Artem Russakovskii is a San Francisco programmer and blogger. Follow Artem on Twitter (@ArtemR) or subscribe to the RSS feed.

In the meantime, if you found this article useful, feel free to buy me a cup of coffee below.