TAG | MySQL

Nov/10

Optimizing MySQL on Ubuntu 10.10 Maverick

7 Comments | Posted by Žilvinas Šaltys in Linux, MySQL, Optimization

Since Ubuntu 9.04 Jaunty Jackalope Ubuntu ships with EXT4 as the default file system. Surprisingly it makes MySQL writes extremely slow. This post is targeted to developers who work on Linux using MySQL and who would like to optimize MySQL performance.

Disk Performance Tuning

First start by tuning your disk performance. To do that you’ll have to sacrifice data consistency over data write speed. First start by enabling journal_data_writeback on your partition. This will allow to write to disk before updating the EXT4 journal. If your box crashes before updating the journal you might loose new data or some deleted data might reappear.

sudo tune2fs -o journal_data_writeback /dev/sda1 (use the right partition)

Next step is editing your /etc/fstab to change ext4 mounting options. My fstab file looks something like this:

UUID=irrelevant / ext4 errors=remount-ro,noatime,nodiratime,data=writeback,barrier=0,nobh,commit=100,nouser_xattr 0 1

There’s a few non default options added to improve write performance over consistency. Journal data writeback is enabled by data=writeback. The main option which is slowing down MySQL is barrier=0. You could actually change this single option and MySQL write performance would increase dramatically. Disabling this option makes your new data less safe when a system crash happens. Option nobh tries to avoid associating buffer heads and offers a minor performance improvement. Another option commit=100 says that all your updates are written to disk every 100 seconds. The default is 5 seconds. If your machine crashes you’re likely to loose 100 seconds of updates. Large commit values like 100 provide big performance improvements. And the last option nouser_xattr disables extended options on your filesystem and provides a minor performance boost.

Double check your /etc/fstab syntax and reboot.

Tuning MySQL configuration

MySQL configuration settings depend on what database engines you’re using. The most common ones are MyISAM and InnoDB. I will assume that you use both.

Warning! Some of the configuration changes will or might make your database inaccessible. Therefore backup all your databases by dumping them to SQL to a safe location. Make sure to include triggers and stored procedures. Double check that you will be able to reimport your backups and only then proceed further. Some options will make your InnoDB database stop working. I’ll mark those. Also backup your MySQL configuration. Just in case.

MySQL settings depend on how much memory you have. I will assume a normal working station will have 4GB of RAM. Open your MySQL configuration file which on Ubuntu is located at /etc/mysql/my.cnf and set the following options.

transaction-isolation = READ-COMMITTED

As a developer you will probably not have transactions running in parallel. If you don’t care about transactions and still use InnoDB set the isolation level to READ-COMMITED. This will make your transactions only see committed data but won’t prevent phantom rows. Setting it to READ-COMMITED will also improve performance.

key_buffer = 512M

By far the most important option for MyISAM. MyISAM indexes are cached using in the key buffer. It’s usually a good bet to set it from 25% to 40% of memory available. As a developer you might not need that much but do not leave it at a default.

query_cache_size = 256M

Caches query results. Especially useful if your applications don’t have caching.

innodb_buffer_pool_size = 1024M (requires a backup and an import)

InnoDB buffer pool size is the most important option for InnoDB. If your whole database is InnoDB you can try and fit your whole database in memory. If you don’t have that much memory you can generally set 70% - 80% of memory available. On a development box you will probably want to have extra RAM for things like Gnome or your IDE.

innodb_additional_mem_pool_size = 32M
innodb_log_buffer_size = 4M
innodb_log_file_size = 128M

innodb_flush_log_at_trx_commit = 2

This option tells InnoDB to only flush log data every two seconds. On development machines you can set this even higher because the only risk is losing transactions during a system crash. If your development machine crashes you probably won’t care about lost transactions. Experiment!

innodb_flush_method = O_DIRECT

This options tells InnoDB to skip filesystem cache and write straight to disk since InnoDB already has it’s own cache - the buffer pool. You save yourself some RAM.

table_cache = 1024

Caches open tables. Might not be very useful on a single dev box but useful in general on any database server.

myisam_use_mmap = 1

Mmap is a new MyISAM feature available with MySQL 5.1. Should improve MyISAM write/read performance ~6%.

To sum up all the settings on a 4GB work environment:

transaction-isolation = READ-COMMITTED
key_buffer = 512M
query_cache_size = 256M
innodb_buffer_pool_size = 1024M
innodb_additional_mem_pool_size = 32M
innodb_log_buffer_size = 4M
innodb_log_file_size = 128M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
table_cache = 1024
myisam_use_mmap = 1

Buy an SSD disk

This is by far the best upgrade you can do. SSD does not have any moving mechanical parts therefore doing a random read or write is as fast as doing a sequential read or write. My work laptop Lenovo T400 can push 3.5 MB with random writes, 35 MB with sequential writes, 2.6MB with random reads and 38MB with sequential reads per second. The same test with an SSD disk can push 220MB random writes and 330MB random reads with similar numbers for sequential reads and writes. So for IO access you can expect 10 - 100 times performance difference.

Summary

It’s easy to squeeze some extra performance out of your development environment by sacrificing data safety. In my case these changes made our database integration test suites run a lot quicker. So far I haven’t experienced any downsides from the above settings though you have to accept that one day it most likely will. Most of the database settings I’ve mentioned are those considered most when tuning production database servers. My final advice is take everything you read here with a pinch of salt as I am by far not an expert in these matters and everything listed here is gathered from various resources online.

Resources

InnoDB performance optimization basics
Tunning MySQL server after installation
MyISAM MMAP feature
MySQL transaction isolation levels
Why you should ignore key cache hit ratio
Tweaks to boost EXT4 performance
|SSD Benchmarks

innodb, Linux, MyISAM, MySQL, performance, Tuning, Ubuntu Hide

Dec/09

PyDumpy - Partial sorted MySQL database dumps

0 Comments | Posted by Žilvinas Šaltys in MySQL, Tools

is a simple Python utility that might be helpful for developers struggling to get fast and partial database snapshots from production databases. It does it’s job by checking the database information schema to find out the approximate rows count available in each table and limits the table if needed to avoid dumping too much data as some databases may have hundreds of gigabytes of data. It then passes all the limits information it gathers to mysqldump a tool created by MySQL to do the actual dumping.

Python does not have a built in package to connect to MySQL as for example PHP does and therefore PyDumpy relies on MySQL for Python package to work. PyDumpy also relies on mysqladmin to do the actual dumping.

PyDumpy is very simple to use. For example to dump a maximum of 50 000 rows from each table type:

./pydumpy.py -H host -u user -p pass -n dbname -limit=50000

PyDumpy also allows to specify row limits and sorting preferences for each table specifically:

./pydumpy.py -H host -u user -p pass -n dbname -limit=50000 -ask-to-limit -ask-to-sort

If you find this tool useful please feel free to provide feedback by leaving a comment.

dump, MySQL, partial, pydumpy Hide

Mar/09

Building Drizzle on Cygwin or getting as far as possible

0 Comments | Posted by Žilvinas Šaltys in MySQL, Tools

It’s been quite a while i have this sort of desire to offer my help for some opensource project i like. One of my most favorite candidates is Drizzle. I should say my knowledge of C is really poor and there’s a whole crazy world out there full of C applications and build tools.

Nevertheless i decided to atleast try and see if i would be able to build it and maybe change something, run some tests. As I am a Windows user i found out the only way for me to build Drizzle is through Cygwin. I started with installing the latest stable version of Cygwin 1.5.25-15. I must say that their installer is really nice but i would offer to add a package search feature. Might help when you want to install numerous packages.

So what’s next? I found this wiki page about building drizzle and figured first thing i should do is get Bazaar. I installed the following packages using Cygwin installer:

bison
bzr
gettext
readline
libpcre0
pcre
pcre-devel
libtoolize
gperf
e2fsprogs

And then went on to get the Drizzle sources:

mkdir ~/bzrwork
bzr init-repo ~/bzrwork
cd ~/bzrwork
bzr branch lp:drizzle

Now onto building. Here’s where all the fun begins.

Drizzle requires a tool named libevent which is not available through Cygwin installer and you must build it yourself. And still you can’t build libevent with the latest version of Cygwin because it lacks certain functionality. After some googling i found a patched IPV6 version of Cygwin that fixes these issues. Added the #define EAI_SYSTEM 11 to http.c and finally were able to ./configure && make && make install libevent.

You also need protobuf installed. And there’s no package for that either. Actually this protobuf is quite nice stuff. Protocol Buffers are a way of encoding structured data in an efficient yet extensible format. Google uses Protocol Buffers for almost all of its internal RPC protocols and file formats.

Now that we seem to have all the packages installed we can start building drizzle. It should be as easy as this:

cd drizzle
./config/autorun.sh
./configure
make
make install

It is not. First to be able to compile Drizzle you need to have gcc4. And even if you do, ./configure must need to know where it is. So we need to use additional flags CC and CXX. Then you need to show ./configure where libevent is installed by adding a flag -with-libevent-prefix=/usr/local or any other place you have it in. I also found a really ugly problem with warnings. I wasn’t able to compile drizzle because it stopped somwhere in gnulib complaining about some warnings that were treated as errors. Funny enough there is a sarcastic option to disable these warnings: -disable-pedantic-warnings. You also probably want to install Drizzle somwhere else than usual by using: -prefix=/some/deploy/dir.

In the end you come up with something like this:

./configure CC=gcc-4 CXX=g++-4 -with-libevent-prefix=/usr/local -disable-pedantic-warnings -prefix=/some/deploy/dir

That’s how far i’ve got with it. Though i’m still not able to compile it. I get an error somwhere in mystrings library that is related to some datatype casting issues. Hopefully i’ll be able to hack through this

c++, cygwin, drizzle, MySQL Hide

Jan/09

Funny MySQL command line option

0 Comments | Posted by Žilvinas Šaltys in MySQL

MySQL has atleast one funny named command line option that made me chuckle. The option is named “-i-am-a-dummy”. From the MySQL manual:

Allow only those UPDATE and DELETE statements that specify which rows to modify by using key values. If you have set this option in an option file, you can override it by using -safe-updates on the command line.

This option is an alias of “-safe-upfates”. I wonder for a while what is the story behind having two alternate names for the same thing. Maybe it’s just the MySQL folks having some fun.

cli, MySQL Hide

Jan/09

MySQL - ORDER BY RAND() optimization

0 Comments | Posted by Žilvinas Šaltys in Lazyweb, MySQL

Interesting read about order by rand optimization by Jan Kneschke. Haven’t used this myself but seems might be rather useful. The performance difference is huge. I wonder if MySQL could optimize ORDER BY RAND() itself when there are no data holes.

MySQL, Optimization, randomization, sorting Hide

Dec/08

Percona offers XtraDB to replace InnoDB

0 Comments | Posted by Žilvinas Šaltys in Lazyweb, MySQL

Who could have guessed Percona would decide to release their own storage engine named XtraDB for MySQL. It’s a drop in replacement for InnoDB. They are forking a version of InnoDB and applying patches of their own. Not only that they are going to put more effort to it to make it something more than just a small variation of InnoDB. Even now it seems to offer better performance and scalability than InnoDB. It also frees XtraDB from Oracle that did an awesome job creating InnoDB but it seems percona want’s to go to the next level. Happy to see XtraDB will be available for others to modify and submit patches. It’s also interesting to find out Percona will be trying to sponsor future changes of XtraDB through their customers.

It is also nice too see that there are some thoughts to merge XtraDB into Drizzle.

innodb, MySQL, percona, xtradb Hide

Dec/08

MySQL Top N Sort Algorithm

0 Comments | Posted by Žilvinas Šaltys in Lazyweb, MySQL

While reading planetmysql I’ve found a very interesting article about Top N Sort algorithm. It would be nice to have this implemented in MySQL. There are millions of applications that have sorted and paginated lists of data. Some of those lists are really big. For example company that I work for has a web application with a report that has milions of rows and it has to be sorted and it only shows a few hundred rows in a single page. This would definitely be a great improvement to MySQL.

algorithm, MySQL, sorting Hide

Dec/08

MySQL character sets and collations

0 Comments | Posted by Žilvinas Šaltys in MySQL, Optimization, PHP

Another great article I’ve found about MySQL character sets and collations that demystifies all the so often found problems in projects made by confused developers. This also can help you to save some space your database takes and improve your queries performance because more of your database can fit into memory.

character-sets, collations, MySQL, sorting Hide

Nov/08

What is the future of relational databases?

0 Comments | Posted by Žilvinas Šaltys in MySQL

A great recorded video keynote from the OpenSQL camp conference featuring Brian Aker. He asks some interesting questions about the state of databases. He has some interesting points about the way we currently use hardware and problems that are at our door and how opensource projects should step up to solve them.

database, MySQL, relational Hide

Nov/08

Common wrong Data Types compilation

0 Comments | Posted by Žilvinas Šaltys in MySQL

Found some nice MySQL data types tips how to create a database schema. Valuable information for intermediate and begginer developers to avoid the common database schema design pitfalls.

data-types, MySQL Hide

The Developer Day | Staying Curious

Blogroll

Archives

TAG | MySQL

Optimizing MySQL on Ubuntu 10.10 Maverick

Disk Performance Tuning

Tuning MySQL configuration

Buy an SSD disk

Summary

Resources

PyDumpy - Partial sorted MySQL database dumps

Building Drizzle on Cygwin or getting as far as possible

Funny MySQL command line option

MySQL - ORDER BY RAND() optimization

Percona offers XtraDB to replace InnoDB

MySQL Top N Sort Algorithm

MySQL character sets and collations

What is the future of relational databases?

Common wrong Data Types compilation

Find it!

Archives

The Developer Day | Staying Curious

Blogroll

Archives

TAG | MySQL

Disk Performance Tuning

Tuning MySQL configuration

Buy an SSD disk

Summary

Resources

Find it!

Tag Cloud

Archives