The Developer Day | Staying Curious

Jan/14

15

Optimizing NGINX TLS Time To First Byte

A very interesting article by Ilya Grigorik on optimizing nginx time to first byte.
Quick summary:

  • Use Nginx 1.5.7 and above due to a SSL buffering issue in 1.4
  • Apply an MTU patch to nginx to reduce roundripts
  • Enable TLS false start on nginx to reduce RTTs further

, , , Hide

May/13

31

Build and publish your own PHP Mongo packages

I had a problem at work where we were using a php mongo driver version 1.3.7 which was crashing for us due to a bug in the driver. Since then they have released a new version of the driver 1.4.0 which solves the previous issue but introduces a new one. Fortunately it was quickly fixed on github but not yet released on pecl. I wanted to try the latest driver to see if it works but did not want to custom compile it and instead wanted to use pecl.

This is when I found a howto create a pear repository on github. Following these instructions I set up my own pear repository. Then I simply checked out the php mongo driver, changed the package.xml to point to my channel and ran pear package which gave me a php mongo driver package which I could now add to my own pear github repo using pirum.

Now I can use this repo to install my own packages for testing, share it with others or even use it in production if I can’t wait for the official release.

, , , , Hide

Mar/13

7

Cleaning Invalid UTF-8 characters in PHP

I ran into an ugly issue having to discard invalid UTF-8 characters from a string before I pass it to json_decode() as otherwise it fails decoding it. First I’ve discovered that it’s possible to ignore invalid UTF-8 characters using:

iconv(“UTF-8″, “UTF-8//IGNORE”, $text)

However turns out this has been broken for ages and using //IGNORE produces an E_NOTICE. Luckily I found a comment which suggests a workaround:

ini_set(‘mbstring.substitute_character’, “none”);
$text = mb_convert_encoding($text, ‘UTF-8′, ‘UTF-8′);

This however was not enough. Because I was getting some characters that were non printable UTF-8 characters json_decode was failing on them as well. To work around this I’ve used:

$text = preg_replace(‘/[^\pL\pN\pP\pS\pZ\pM]/u’, ”, $text);

This will remove new lines as well which is fine for me. You can also try a removing non-printable byte sequences.

, , , Hide

Mar/12

9

Making QCon better

This post is my rant on how to make QCon a better conference or any developer conference for that matter. I’ve attended it for my first time in 2012 and was expecting a great deal since it’s probably the best there is with opening keynotes from people like Martin Fowler.

Quality Over Quantity

In my eyes it’s much better to have 1 day worth of talks with exclusive speakers than three days with lots of very average content with average presenters. QCon had 3 days, 7 parallel lanes and 6 presentations in each one of them. That’s 18 presentations in total for one visitor to see. Or over a hundred presentations in total. However it seems to me from following twitter that only ~10 of them were very well accepted. Some were so good that there was not even a spot to stand.

This leads me to suggest that the conference should be more strict who can attend these massive conferences. I would suggest that it only accepts speakers who have spoken in smaller conferences at least 5 times. There could be selective early bird invites to review the talks before they get accepted. Maybe you could get a discount if you agree to do so. The author would obviously have to make a recording of his talk.

Talk Titles Descriptions

So many times I have felt fooled and lured by a catchy title just to try to stay awake during the talk. I guess there’s nothing wrong to have a catchy title. However I would suggest that each accepted speaker has an introductory video on who he is and what he’s going to speak about. That way you can get a feel of a person before you go see him. Also if the above suggestions have been followed then links to his previous speeches in video should be included so viewers could review how good the speaker is.

Another great addition would be to mark how advanced the talk is. Just add a simple tag: introduction, advanced, expert. However I would argue that QCon should not have any introductions to topics like NoSQL.

Hack Fest Anyone?

My colleague suggested a somewhat crazy idea. To organize a hackfest so that people could do some coding together, solve some challenges, get to know each other, compete, get EXCITED. Have you seen the movie Social Network where devs were competing for a job at facebook and had to hack a linux box while drinking vodka shots during the challenge. Something in that direction maybe?

A conference for everyone

This is an experimental thought also suggested by my colleague. There’s a pattern that developers love talks which talk about organizational changes. It’s weird but they applaud more the talks on how to have a great team, great processes rather than a new tool. However when they go back to their organization it’s very difficult to apply these ideas at your workplace because most of the time you’ll be met with outright scepticism and denial. What if your product managers, your QA’s, your sysadmins and maybe even your CEO would find it worthwhile to go to such a conference. How awesome would it have been if all these different roles could haven seen the Github talk together with you? Or would you be interested to hear from CEO’s who were developers once and how they became who they are and what they’ve learned along the way.

Sponsor Giveaways

Always organise giveaways at the end of the conference in one big go rather than at random times during the conference. This will keep people from going home and staying in the main hall, getting a few beers, maybe meeting someone and it will also make it more convenient for the visitors.

Don’t kill the messenger

This whole rant might make me sound like an ungrateful bastard but it’s not like that. I do appreciate people trying and doing what they do. The QCon that is right now is still infinitely better than no QCon at all. However as everything else it can be even greater.

, , Hide

Mar/12

9

My top 5 QCon London 2012 talks

I was attending the conference for the first time and wanted to share my top list of talks. There might have been other good ones but I didn’t get a chance to see them therefore can’t comment on them. All of these talks will be released on InfoQ in the next 6 months for your viewing.

#1 Lock-free Algorithms for Ultimate Performance

by Martin Thompson and Michael Barker. Talk Download slides

This has to be my favourite talk. These guys know they’re hardware well. They showed how it’s possible to create lock free algorithms which run in nanoseconds if you apply some mechanical sympathy to your code.

#2 Decisions, decisions

by Dan North Talk Download slides

A really entertaining, insightful and somewhat shocking talk. Not an unusual talk by Dan North in anyway. He talks about decisions (obviously) and that each one of them is a trade-off. Sometimes we forget to weigh these trade-off’s.

#3 How GitHub Works

by Zach Holman Talk Slides will be available later.

A great talk and visually beautiful as well. Even has some singing in it! Zach is a wonderful presenter and explains how github does what it does so well. Their approach is very unusual and alien from the one we see in the corporate world yet it works so well.

#4 Developers Have a Mental Disorder

by Greg Young Talk This guy does not need slides!

A non technical talk by a very technical person. It was Greg’s comeback to the QCon stage and a great one. He delivers a speech on many of the disorders developers carry with them in their work. The best one so far being: solving problems no one has.

#5 Concurrent Programming Using The Disruptor

by Trisha Gee Talk Download slides

It would be enough to say this talk was delivered by a beautiful lady packed with a room of geeks but it deserves more praise. It’s a quite technical talk on an innovative and great piece of software or rather a framework: The Disruptor a concurrent programming framework. If you’e unfamiliar with it or even more so unfamiliar with concurrent programming you might find it interesting.

Special Mention – Scalable Internet Architectures

by Theo Schlossnagle Talk Slides will be available later.

Haven’t attended this one myself but have heard that it was great. Managed to only see a few minutes of it where the author said: “Don’t be a fucking idiot!”. This sentence alone makes a talk good.

, , Hide

Feb/12

28

Continuous Delivery / Ninja Deployments

Recently I’ve attended the PHPUK 2012 conference where I went to see a talk “To a thousand servers and beyond: scaling a massive PHP application” by Nikolay Bachiyski. The talk itself was more about how wordpress.com scales to serve it’s massive load but what got me interested to write this blog post is how wordpress.com does deployments.

There are two parts about WordPress. One is the blog that you can download and host on your own servers. The other is where you can create an account on wordpress.com and set up a blog on their own servers. These two are developed and released separately. What we’ve found out from the talk is that about 50 developers have access to wordpress.com codebase and can make changes and they do about 100 commits to trunk a day.

Now the interesting part is that every commit to trunk is an actual deployment to the live platform. And it’s super crazy fast. It takes 8 seconds for them to deploy WordPress.com to 3 datacenters. Note 100 commits equates to 100 deploys a day. And they don’t have a QA team, a testable environment or a stage environment. Crazy if you ask me but apparently it works for them. They serve hundreds of billions of pageviews and manage to keep the platform stable.

When asked Nikolay explained that it’s a much better strategy for them than going into 2 weeks of merging nightmares where all new changes are merged into a stable branch. I think that merge nightmares is as extreme as ninja deployments from trunk. I do believe in a balanced approach and think we’ve managed to achieve it at AOL with our own projects.

A Different Approach

We use an internally made tool which tracks on top of SVN all the changes made to different branches and allows to easily move those changesets from one branch to another. With every project repository we have three branches: trunk, testable and stable. Once a developer wants to make a new commit he would commit with a comment like this: “#123 > comment message” and this will assign a commit to a specific ticket number in our ticket system and do the commit. If a dev needs to make 10 commits he would do all of them against the same ticket number. Once he’s done he uses the internal tool to mark the set of changesets he made as resolved.

This is where the QA’s can now take all of those changesets and try and merge them into a testable branch when they feel they’re ready to test. They again do it via our internal tool. The smart thing here is that the tool detects all possible conflicts by dry running the merge and warning you which tickets conflict with which tickets. 95% of the time if conflicts happen is because people try to merge newer changesets first rather than merging older changesets first. Even then a lot of times it’s possible to merge ingnoring the conflict which does not cause any trouble later on.

We try to maintain discipline and push things in the order they were developed. Still conflicts do happen. It’s unavoidable. But for that we have a separate tole: a release manager. Who is responsible for solving these merge conflicts and usually they’re very minor, they quickly catch a dev responsible for the changeset and work it out. The release manager is also the guy who controls what goes into stable and then deploys to live with a click of a button.

Before we had this tool we lived in the nightmare merge world. But no more. We’re actually managing to deliver continuously deploying few times a day. It also allows our QA’s to have a controlled environment with only the changes they want. Yes it takes an extra role but that’s a minor cost for us considering the other two extremes. I believe this is a much more balanced approach that can and does make both the business owners happy and the developers less suicidal.

p.s The tool described is developed by one of our developers and last time I checked he seriously considered to make it opensource but want’s to polish it a bit further first.

, , Hide

Sep/11

1

Dumping Memcache Keys

Sometimes it’s useful to be able to quickly peek what keys memcache is storing and how old are they. A good use case for example could be to check whether something is cached or not or that they expire as they should.

At first I found a way to dump memcache keys through telnet. However if a memcache instance is fairly large and has a lot of slabs and thousands of keys it becomes impractical to do it manually.

I wrote a simple utility that helps me find keys across all memcache slabs.

#!/usr/bin/php
< ?php
$host = "127.0.0.1";
$port = 11211;
$lookupKey = "";
$limit = 10000;

$time = time();

foreach ($argv as $key => $arg) {
    switch ($arg) {
        case '-h':
            $host = $argv[$key + 1];
            break;
        case '-p':
            $port = $argv[$key + 1];
            break;
        case '-s':
            $lookupKey = $argv[$key + 1];
            break;
        case '-l':
            $limit = $argv[$key + 1];
    }
}

$memcache = memcache_connect($host, $port);

$list = array();
$allSlabs = $memcache->getExtendedStats('slabs');
$items = $memcache->getExtendedStats('items');

foreach ($allSlabs as $server => $slabs) {
    foreach ($slabs as $slabId => $slabMeta) {
        if (!is_numeric($slabId)) {
            continue;
        }
    
        $cdump = $memcache->getExtendedStats('cachedump', (int)$slabId, $limit);
        
        foreach ($cdump as $server => $entries) {
            if (!$entries) {
                continue;
            }
            
            foreach($entries as $eName => $eData) {
                $list[$eName] = array(
                    'key' => $eName,
                    'slabId' => $slabId,
                    'size' => $eData[0],
                    'age' => $eData[1]
                );
            }
        }
    }
}

ksort($list);

if (!empty($lookupKey)) {
     echo "Searching for keys that contain: '{$lookupKey}'\n";
     foreach ($list as $row) {
        if (strpos($row['key'], $lookupKey) !== FALSE) {
            echo "Key: {$row['key']}, size: {$row['size']}b, age: ", ($time - $row['age']), "s, slab id: {$row['slabId']}\n";
        }
     }
} else {
    echo "Printing out all keys\n";
    foreach ($list as $row) {
        echo "Key: {$row['key']}, size: {$row['size']}b, age: ", ($time - $row['age']), "s, slab id: {$row['slabId']}\n";
    } 
}

This script accepts 4 parameters:

-h host
-p port
-s partial search string
-l a limit of how many keys to dump from a single slab (default 10,000)

The easiest way to use it:

./membrowser.php -s uk
Searching for keys that contain: ‘uk’
Key: 1_uk_xml, size: 3178b, age: 1728s, slab id: 17
Key: 2_uk_xml, size: 3178b, age: 1725s, slab id: 17
Key: 3_uk_xml, size: 3178b, age: 1721s, slab id: 17

Download memcache keys dump script.

P.S some of the code I’ve copied from 100days.de blog post.

, , Hide

Since Ubuntu 9.04 Jaunty Jackalope Ubuntu ships with EXT4 as the default file system. Surprisingly it makes MySQL writes extremely slow. This post is targeted to developers who work on Linux using MySQL and who would like to optimize MySQL performance.

Disk Performance Tuning

First start by tuning your disk performance. To do that you’ll have to sacrifice data consistency over data write speed. First start by enabling journal_data_writeback on your partition. This will allow to write to disk before updating the EXT4 journal. If your box crashes before updating the journal you might loose new data or some deleted data might reappear.

sudo tune2fs -o journal_data_writeback /dev/sda1 (use the right partition)

Next step is editing your /etc/fstab to change ext4 mounting options. My fstab file looks something like this:

UUID=irrelevant / ext4 errors=remount-ro,noatime,nodiratime,data=writeback,barrier=0,nobh,commit=100,nouser_xattr 0 1

There’s a few non default options added to improve write performance over consistency. Journal data writeback is enabled by data=writeback. The main option which is slowing down MySQL is barrier=0. You could actually change this single option and MySQL write performance would increase dramatically. Disabling this option makes your new data less safe when a system crash happens. Option nobh tries to avoid associating buffer heads and offers a minor performance improvement. Another option commit=100 says that all your updates are written to disk every 100 seconds. The default is 5 seconds. If your machine crashes you’re likely to loose 100 seconds of updates. Large commit values like 100 provide big performance improvements. And the last option nouser_xattr disables extended options on your filesystem and provides a minor performance boost.

Double check your /etc/fstab syntax and reboot.

Tuning MySQL configuration

MySQL configuration settings depend on what database engines you’re using. The most common ones are MyISAM and InnoDB. I will assume that you use both.

Warning! Some of the configuration changes will or might make your database inaccessible. Therefore backup all your databases by dumping them to SQL to a safe location. Make sure to include triggers and stored procedures. Double check that you will be able to reimport your backups and only then proceed further. Some options will make your InnoDB database stop working. I’ll mark those. Also backup your MySQL configuration. Just in case.

MySQL settings depend on how much memory you have. I will assume a normal working station will have 4GB of RAM. Open your MySQL configuration file which on Ubuntu is located at /etc/mysql/my.cnf and set the following options.

transaction-isolation = READ-COMMITTED

As a developer you will probably not have transactions running in parallel. If you don’t care about transactions and still use InnoDB set the isolation level to READ-COMMITED. This will make your transactions only see committed data but won’t prevent phantom rows. Setting it to READ-COMMITED will also improve performance.

key_buffer = 512M

By far the most important option for MyISAM. MyISAM indexes are cached using in the key buffer. It’s usually a good bet to set it from 25% to 40% of memory available. As a developer you might not need that much but do not leave it at a default.

query_cache_size = 256M

Caches query results. Especially useful if your applications don’t have caching.

innodb_buffer_pool_size = 1024M (requires a backup and an import)

InnoDB buffer pool size is the most important option for InnoDB. If your whole database is InnoDB you can try and fit your whole database in memory. If you don’t have that much memory you can generally set 70% – 80% of memory available. On a development box you will probably want to have extra RAM for things like Gnome or your IDE.

innodb_additional_mem_pool_size = 32M
innodb_log_buffer_size = 4M
innodb_log_file_size = 128M

innodb_flush_log_at_trx_commit = 2

This option tells InnoDB to only flush log data every two seconds. On development machines you can set this even higher because the only risk is losing transactions during a system crash. If your development machine crashes you probably won’t care about lost transactions. Experiment!

innodb_flush_method = O_DIRECT

This options tells InnoDB to skip filesystem cache and write straight to disk since InnoDB already has it’s own cache – the buffer pool. You save yourself some RAM.

table_cache = 1024

Caches open tables. Might not be very useful on a single dev box but useful in general on any database server.

myisam_use_mmap = 1

Mmap is a new MyISAM feature available with MySQL 5.1. Should improve MyISAM write/read performance ~6%.

To sum up all the settings on a 4GB work environment:

transaction-isolation = READ-COMMITTED
key_buffer = 512M
query_cache_size = 256M
innodb_buffer_pool_size = 1024M
innodb_additional_mem_pool_size = 32M
innodb_log_buffer_size = 4M
innodb_log_file_size = 128M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
table_cache = 1024
myisam_use_mmap = 1

Buy an SSD disk

This is by far the best upgrade you can do. SSD does not have any moving mechanical parts therefore doing a random read or write is as fast as doing a sequential read or write. My work laptop Lenovo T400 can push 3.5 MB with random writes, 35 MB with sequential writes, 2.6MB with random reads and 38MB with sequential reads per second. The same test with an SSD disk can push 220MB random writes and 330MB random reads with similar numbers for sequential reads and writes. So for IO access you can expect 10 – 100 times performance difference.

Summary

It’s easy to squeeze some extra performance out of your development environment by sacrificing data safety. In my case these changes made our database integration test suites run a lot quicker. So far I haven’t experienced any downsides from the above settings though you have to accept that one day it most likely will. Most of the database settings I’ve mentioned are those considered most when tuning production database servers. My final advice is take everything you read here with a pinch of salt as I am by far not an expert in these matters and everything listed here is gathered from various resources online.

Resources

InnoDB performance optimization basics
Tunning MySQL server after installation
MyISAM MMAP feature
MySQL transaction isolation levels
Why you should ignore key cache hit ratio
Tweaks to boost EXT4 performance
|SSD Benchmarks

, , , , , , Hide

Jul/10

21

CQRS Resources

In the past I wrote about what CQRS is and now I am adding a list of available CQRS resources known to me. If you come by any other CQRS resources online please post a comment with your link. Thank you.

Video Presentations / Interviews

Greg Young on Unshackle Your Domain
Udi Dahan on CQRS, DDD, NServiceBus
Udi Dahan on CQRS and Domain Models
Greg Young on Architectural Innovation, Eventing and Event Sourcing
Greg Young on CQRS and Event Sourcing: The Business Perspective
Udi Dahan on CQRS
Udi Dahan on CQRS, Race Conditions, Sagas
Udi Dahan on CQRS, Event Sourcing
CQRS/DDD by Greg Young at Professional.NET 2011 in Vienna
Practical CQRS by Rinat Abdullin

Articles / Blogs / Blog Posts

Greg Young’s Blog – a lot of posts on CQRS and related topics.
Think Before Coding – blog posts on CQRS and related topics
CQRS isn’t the answer by Udi Dahan.
Clarified CQRS by Udi Dahan
CQRS a la Greg Young by Mark Nijhof
Brownfield CQRS by Richard Dingwall.
Transitioning from DDD lite by Julien Letrouit
Why I Love CQRS
CQRS on Cloud by Rinat Abdullin

Frameworks, Code Examples

C# CQRS Example by Mark Nijhof
C# CQRS Framework
JAVA Axon Framework
Lokad CQRS Framework
NCQRS Framework
Kitchen Example

Other

CQRS mailing list
DDD Mailing List – Usually lot’s of conversations on CQRS

, , , Hide

Jul/10

5

What is CQRS?

CQRS is a software architecture pattern which stands for Command Query Responsibility Segregation. The author of the pattern name CQRS is Greg Young who first described it in his blog:

I am going to throw out a quick pattern name for this and call it Command and Query Responsibility Segregation or CQRS as we are in fact simply taking what would usually be one object and splitting its responsibilities into two objects.

At the time of writing CQRS does not have an official definition. It’s difficult to define CQRS with a definition that would be both simple and useful. To describe CQRS at an object level I’ve came up with a definition which is just a reworded sentence from Greg Young’s blog post:

Command Query Responsibility Segregation or CQRS is the creation of two objects where there was previously one. The separation occurs based upon whether the methods are a command or a query.

CQRS can also be defined at a higher level. Greg Young was kind to provide a definition:

Command Query Responsibility Segregation or CQRS is the recognition that there are differing architectural properties when looking at the paths for reads and writes of a system. CQRS allows the specialization of the paths to better provide an optimal solution.

CQRS pattern is similar to CQS by Meyer but is also different. CQS separates command methods that change state from query methods that read state. CQRS goes further and separates the command methods that change state and query methods that read into two different objects.

Benefits of CQRS

  • The most simple benefit of CQRS is that it simplifies read and write models by separating them. Write model no longer contains queries and developers can directly focus on domain model behaviours. What otherwise could have been a repository with hundreds of different read methods mixed with different lazy loading, pre-fetch and paging strategies can now be hidden away in a separate read model.
  • Another reason is Divergent Change. Divergent change occurs when one class is commonly changed in different ways for different reasons. You might be modifying queries more often than commands which might not only break your read queries but your commands as well. By having them separated you minimise the risks of both being broken.
  • The single most important benefit of CQRS is that by separating read and write models you can make different choices on different models. For example you may optimize your write model for write performance and your read system for read performance.
  • Another nice feature of CQRS is the available option to easily distribute work across separate teams. For example the read part of a web e-shop application can be outsourced to less expensive developers offshore.
  • Event sourcing is a different pattern which shares a strong symbiotic relation with CQRS. Once your system reaches an architectural level where you may need multiple data models it might and probably will introduce synchronization issues. It is then impossible to say which model is incorrect. In an event centric system where commands are translated into events by the domain model these events can be used as the primary data model. This not only solves data synchronization issues, but also significantly improves testing by allowing to test for “what didn’t happen” and opens easy doors for integration with other systems since other systems can now listen to the events published by the domain model.
  • Eventual Consistency. In very simple terms Eventual Consistency can be defined as simply just caching. In event centric systems it is possible to delay the handling of published domain model events and handle them in a different thread or a process. This will make write and read data models inconsistent but it might significantly improve the performance of your commands.

In Conclusion

CQRS is a very interesting pattern. By some it may even be considered to be the silver bullet. It isn’t. Like all patterns CQRS has tradeoffs. It may be difficult to sell CQRS to management since it’s not a well known classic approach to software architecture. Less known tools, technologies. As an example in the PHP world there are currently no mature service buses such as NServiceBus in the .NET world. It is almost impossible or more often than not worth the Return of Investment to migrate legacy apps to CQRS.

, , , , Hide

Older posts >>

Find it!

Theme Design by devolux.org