Thursday, February 13, 2014

On the road again - FOSSAsia

On the road again - FOSSAsia

It has been a few busy months until now. I have moved from Italy to Thailand, and the move has been my first priority, keeping me from attending FOSDEM and interacting with social media. Now I start catching my breath, and looking around for new events to attend. But before I get into this, let’s make a few things clear:

  • I am still working for Continuent. Actually, it’s because of my company flexibility that I could move to a different country (a different continent, 6 time zones away) without much trouble. Thanks, Continuent! (BTW: Continuent is hiring! )
  • I am still involved with MySQL activities, events, and community matters. I just happen to be in a different time zone, where direct talk with people in Europe and US need to happen on a different schedule.

I am already committed to attend Percona Live MySQL Conference & Expo in Santa Clara, where I will present a tutorial on MySQL replication features and a regular session on multi-master topologies with Tungsten.

But in the meantime, Colin encouraged me to submit talk proposals at FOSSAsia, and both my submissions were accepted. So, at the end of February I will be talking about some of my favorite topics:

  • Easy MySQL multi master replication with Tungsten
  • Data in the cloud: mastering the ephemeral

The exact schedule will be announced shortly. I am eager to attend an open source event in Asia. It’s been a long time since I went to a similar event in Malaysia, which was much pleasant.

Thursday, January 16, 2014

PerconaLive 2014 program is published

PerconaLive 2014 program is published

Percona Live MySQL Conference and Expo, April 1-4, 2014

After a few months of submissions and reviews, the program for PerconaLive MySQL Conference and Expo 2014 is published. The conference will be held in Santa Clara, from April 1 to 4, 2014.

Registration with early bird discount is available until February 2nd. If you plan to attend, this is probably the best time to act.

I will be presenting twice at the conference:

Notice that the Call for Participation is still open for lightning talks and BoF. You can submit a talk until the end of January.

Tuesday, January 07, 2014

Multiple masters : attraction to the stars

In the last 10 years I have worked a lot with replication systems, and I have developed a keen interest in the topic of multiple masters in a single cluster. My interest has a two distinct origins:

  • On one hand, I have interacted countless times with users who want to use a replication system as a drop-in replacement for a single server. In many cases, especially when users are dealing with applications that are not much flexible or modular, this means that the replication system must have several points of data entry, and such points must work independently and in symbiosis with the rest of the nodes.
  • On the other hand, I am a technology lover (look it up in the dictionary: it is spelled geek), and as such I get my curiosity stirred whenever I discover a new possibility of implementing multi-master systems.

The double nature of this professional curiosity makes me sometimes forget that the ultimate goal of technology is to improve the users life. I may fall in love with a cute design or a clever implementation of an idea, but that cleverness must eventually meet with usability, or else it loses its appeal. There are areas where the distinction between usefulness and cleverness is clear cut. And there are others where we really don’t know where we stand because there are so many variables involved.

One of such cases is a star topology, where you have many master nodes, which are connected to each other through a hub. You can consider it a bi-directional master/slave. If you take a master/slave topology, and make every node able to replicate back to the master, then you have almost a star. To make it complete, you also need to add the ability of the master of broadcasting the changes received from the outside nodes, so that every node gets the changes from every other node. Compared to other popular topologies, say point-to-point all-masters, and circular replication, the star topology has the distinct advantage of requiring less connections, and of making it very easy to add a new node.

Star

Figure #1: Star topology

However, anyone can see immediately one disadvantage of the star topology: the hub is the cornerstone of the cluster. It’s a single point of failure (SPOF). If the hub fails, there is no replication anywhere. Period. Therefore, when you are considering a multi-master topology, you have to weigh in the advantages and disadvantages of the star, and usually you consider the SPOF as the most important element to consider.

Depending on which technology you choose, though, there is also another important element to consider, i.e. that data must be replicated twice when you use a star topology. It’s mostly the same thing that happens in a circular replication. If you have nodes A, B, C, and D, and you write data in A, the data is replicated three times before it reaches D (A->B, B->C, and C->D). A star topology is similar. In a system where A, B, and D are terminal nodes, and C is the hub, data needs to travel twice before it reaches D (A->C, C->D). Circular replication

Figure #2: Circular replication

This double transfer is bad for two reasons: it affects performance, and it opens to the risk of unexpected transformations of data. Let’s explore this concept a bit. When we replicate data from a master to a slave, there is little risk of mischief. The data goes from the source to a reproducer. If we use row-based-replication, there is little risk of getting the wrong data in the slave. If we make the slave replicate to a further slave, we need to apply the data, generate a further binary log in the slave host, and replicate data from that second binary log. We can deal with that, but at the price of taking into account more details, like where the data came from, when to stop replicating in a loop, whether the data was created with a given configuration set, and so on. In short, if your slave server has been configured differently from the master, chances are that the data down the line may be different. In a star topology, this translates into the possibility of data in each spoke to be replicated correctly in the hub, but to be possibly different in the other spokes.

Compare this with a point-to-point all-masters. In this topology, there are no SPOFs. You pay for this privilege by having to set a higher number of connections between nodes (every node must connect to every other node), but there is no second hand replication. Before being applied to the slave service, the data is applied only once in the originating master.

Point to point all masters

Figure #2: Point-to-point all-masters topology

Where do I want to go from all the above points? I have reached the conclusion that, much as user like star topologies, because of their simplicity, I find myself often recommending the more complex but more solid point-t-point all-masters setup. Admittedly, the risk of data corruption is minimal. The real spoiler in most scenarios is performance. When users realize that the same load will flow effortlessly in a point-to-point scenario, but cause slave lags in a star topology, then the choice is easy to make. If you use row-based replication, and in a complex topology it is often a necessary requirement, the lag grows to a point where it becomes unbearable.

As I said in the beginning, all depends on the use case: if the data load is not too big, a star topology will run just as fine as point-to-point, and if the data flow is well designed, the risk of bad data transformation becomes negligible. Yet, the full extent of star topologies weaknesses must be taken into account when designing a new system. Sometimes, investing some effort into deploying a point-to-point all-masters topology pays off in the medium to long term. Of course, you can prove that only if you deploy a star and try it out with the same load. If you deploy it on a staging environment, no harm is done. If you deploy in production, then you may regret. In the end, it all boils down to my mantra: don’t trust the theory, but test, test, test.

Thursday, December 12, 2013

Quick and dirty concurrent operations from the shell

Let’s say that you want to measure something in your database, and for that you need several operations to happen in parallel. If you have a capable programming language at your disposal (Perl, Python, Ruby, PHP, or Java would fit the bill) you can code a test that sends several transactions in parallel.

But if all you have is the shell and the mysql client, things can be trickier. Today I needed such a parallel result, and I only had mysql and bash to accomplish the task.

In the shell, it’s easy to run a loop:

for N in $(seq 1 10)
do
    mysql -h host1 -e "insert into sometable values($N)" 
done

But this does run queries sequentially, and each session will open and close before the next one starts. Therefore there is no concurrency at all.
Then I thought that the method for parallel execution in the shell is to run things in the background, and then collect the results. I just needed to find a way of keeping the first session open while the others are being started.

Here’s what I did: I ran a loop with a countdown, using the seq command, and I included a sleep statement in each query, waiting for a decreasing amount of seconds. If I start with 10 seconds, the first query will sleep for 10 seconds, the second one for 9 seconds, and so on. I will run each command in the background, so they will eat up the time independently.

#!/bin/bash
mysql -h host1 test -e 'drop table if exists t1'
mysql -h host1 test -e 'create table t1 (i int not null primary key, ts timestamp)'

for N in $(seq 10 -1 1)
do
    query1="set autocommit=0"
    query2="insert into test.t1 (i) values($N)"
    query3="select sleep($N) into @a; commit"
    mysql -h host1 -e "$query1;$query2;$query3" &
done

wait

mysql -h host1 test -e 'select * from t1'

The effect of this small script is that the commit for these 10 commands come at the same time, as you can see from the resulting table:

+----+---------------------+
| i  | ts                  |
+----+---------------------+
|  1 | 2013-12-12 18:08:00 |
|  2 | 2013-12-12 18:08:00 |
|  3 | 2013-12-12 18:08:00 |
|  4 | 2013-12-12 18:08:00 |
|  5 | 2013-12-12 18:08:00 |
|  6 | 2013-12-12 18:08:00 |
|  7 | 2013-12-12 18:08:00 |
|  8 | 2013-12-12 18:08:00 |
|  9 | 2013-12-12 18:08:00 |
| 10 | 2013-12-12 18:08:00 |
+----+---------------------+

This is a very good result, but what happens if I need to run 500 queries simultaneously, instead of 10? I don’t want to wait 500 seconds (8+ minutes). So I made an improvement:

for N in $(seq 5000 -10 1)
do
    echo $N
    query1="set autocommit=0"
    query2="insert into test.t1 (i) values($N)"
    query3="select sleep(concat('0.', lpad($N,4,'0'))) into @a; commit"
    mysql -h host1 -e "$query1;$query2;$query3" &
done

Now each SLEEP command is called with a fractional argument, starting at “0.5000”, and continuing with “0.4999,” and so on. You can try it. All 500 rows are committed at the same time.

However, the same time is a bit fuzzy. When we use timestamps with second granularity, it’s quite easy to show the same time. But with microseconds it’s a different story. Here’s what happens if I use MySQL 5.6 with timestamp columns using microseconds (TIMESTAMP(3)):

+----+-------------------------+
| i  | ts                      |
+----+-------------------------+
|  1 | 2013-12-12 18:27:24.070 |
|  2 | 2013-12-12 18:27:24.070 |
|  3 | 2013-12-12 18:27:24.069 |
|  4 | 2013-12-12 18:27:24.068 |
|  5 | 2013-12-12 18:27:24.065 |
|  6 | 2013-12-12 18:27:24.066 |
|  7 | 2013-12-12 18:27:24.062 |
|  8 | 2013-12-12 18:27:24.064 |
|  9 | 2013-12-12 18:27:24.064 |
| 10 | 2013-12-12 18:27:24.064 |
+----+-------------------------+

For the purpose of my tests (the actual queries were different) this is not an issue. Your mileage may vary.

Monday, December 09, 2013

Old and new MySQL verbosity

I was pleased to see Morgan’s announcement about a fix to an old problem of mine. In March 2012 I complained about MySQL verbosity during installation.

In MySQL 5.7.3, this behavior was changed. While the default is still as loud as it can, you can now add an option (log_error_verbosity) to send only errors to STDERR, which allows you to hide the output of mysql_install_db, and still get the errors, if they occur.

Well done!

However, the same obnoxious verbosity is also in MariaDB 10.0.x. Since I discussed this specific bug with a few MariaDB developers early in 2012, I was disappointed to see this same output when running mysql_install_db with MariaDB. Here’s the same appeal: MariaDB developers, please fix this usability issue!

And now, for the laughing notes. All versions of MySQL available now, from Oracle, Percona, MariaDB, list this line when installing with mysql_install_db:

Please report any problems with the './scripts/mysqlbug' script!

There is Bug#29716 that was reported in 2008, about mysqlbug being unnecessary (by then, it had been obsolete for 2 or 3 years already), with a patch submitted but not committed. So, in 2013, we still see a reference to a tool that has ceased working for at least 8 years. It should not take much to remove this line and replace it with an appropriate link to the bugs system.

Sunday, December 08, 2013

Submissions at Percona Live Santa Clara 2014 and Lightning talks

The call for participation at Percona Live MySQL Conference and Expo 2014 is now closed. There have been more than 320 submissions, and this will keep the review committee busy for a while.

One important point for everyone who has submitted: if you have submitted a proposal but haven’t included a bio in your account, do it now. If you don’t, your chances of being taken seriously are greatly reduced. To add a bio, go to your account page and fill in the Biography field. Including a picture is not mandatory, but it will be definitely appreciated.

Although the CfP is closed for tutorials and regular sessions, your chances of becoming a celebrity are not over yet. The CfP is still open for Lightning talks and Bird of a Feather sessions.

If you want to submit a lightning talk, you still have time until the end of January. Don’t forget to read the instructions and remember that lightning talks don’t give you a free pass, but a healthy 20% discount.

So far, I have received 16 proposals. Of these, 6 have been rated highly enough to guarantee acceptance (including mine, for which I have not voted.) We still have 6 spots to fill (12 spots in total, 5 minutes each,) and I’d rather fill them with talks that appeal to everyone in the committee, than scrap the barrel of the mediocre ones. My unofficial goal is to have so many good submissions that I will have to withdraw my own talk. Thus, the potential number of available spots is 7. Please kick my talk off stage, by submitting outstanding proposals!

Friday, November 15, 2013

Parallel replication: off by one

One of the most common errors in development is where a loop or a retrieval by index falls short or long by one unit, usually because of an oversight or a logic in coding.

Of the following snippets, which one will run 10 times?

/* #1 */    for (N = 0 ; N < 10; N++) printf("%d\n", N);

/* #2 */    for (N = 0 ; N <= 10; N++) printf("%d\n", N); 

/* #3 */    for (N = 1 ; N <= 10; N++) printf("%d\n", N);

/* #4 */    for (N = 1 ; N < 10; N++) printf("%d\n", N);

The question is deceptive, as there are two snippets that will run 10 times (1 and 3). But they will print different numbers. If you ware aiming for numbers from 1 to 10, only #3 is good.

After many years of programming, off-by-one errors are rare in my code, and I have been able to spot them or prevent them at first sight. That’s why I feel uneasy when I look at the way parallel replication is enabled in MySQL 5.6,5.7 and MariaDB 10.0.5. In both cases, there is a variable that sets the number of replication threads:

set global slave_parallel_workers=5 in MySQL
set global slave_parallel_threads=5 in mariadb

Yet, for both implementations, you can set the number of threads to 1, and it is not the same as disabling parallel replication.

set global slave_parallel_workers=1 in MySQL
set global slave_parallel_threads=1 in mariadb

It will run parallel replication with one thread, meaning that you will have all the overhead of parallel replication with none of the benefits. Not only that, but replication actually slows down. The extra channel reduces performance by 7% in MariaDB and 10% in MySQL.

Now for the punch line. In Tungsten-Replicator, to disable parallel replication you set the number of channels to 1 (the intuitive value). If you set it to 0, the setup fails, as it should, since there would be no replication without channels. The reason for the fit is that in Tungsten, parallel replication was designed around the core functionality, while in MySQl and MariaDB it is an added feature that struggles to be integrated.

Wednesday, November 13, 2013

Call for papers (with lightning talks): Percona Live MySQL Conference 2014

The call for participation for Percona Live MySQL Conference 2014 is still open. As part of the review committee, I will be looking at the proposals, and I hope to see many interesting ones.

There is a novelty in the submission form. In addition to tutorials and regular sessions, now you can submit proposals for lightning talks, to which I am particularly interested, as I have organized the lightning talks in the past two editions, and I am in charge to continue the tradition for the next one.

If you want to be a speaker at the conference, here are some tips to get your proposal accepted:

  • Propose a topic that you know well;
  • Take some time to write a well thought and meaningful proposal: nothing gets me faster to the rejection button than statements like “I want to talk about X, I will think of something to say”;
  • Write your proposal with the attendees in mind, i.e. giving information that will make them want to see your presentation;
  • But also write with the committee in mind. There is a space for private messages to the reviewers. Use it wisely if there is something that we need to know.
  • Mind your buzzwords. I am not easily impressed by fashionable topics. Back your proposal with sound reasoning. Don’t assume that I, or anyone in the committee, see things your way, or the way they are reported in the press.
  • Check your spelling. Another way of getting rejected quickly is when you misspell the topic you claim to be an expert of.
  • And check your spelling again. If you miss the difference between “know its shit” and “know it’s shit,” I am less inclined to approve.
  • Write a sensible bio. We need to know who are you and what you do, to see if your story is compatible with your proposal.
  • Write enough to make your proposal clear. A proposal that is shorter than your bio will raise a red flag. But do not write too much. You are writing a proposal, not an article on the matter. If you have written an interesting article on the topic, give us an URL.

Regarding the lightning talks, I have some more recommendations.

  • A lightning talk last 5 minutes maximum. Don’t propose a topic that cannot be exhausted in that timeframe.
  • An accepted lightning talk does not give you a free pass (unless you are also accepted as speaker for a regular talk). You will be given a code to register at a 15% discount.
  • You should propose something that it is either highly interesting, or surprising, or entertaining, or all the above: the lightning talks are a show.
  • Be daring in your proposals. While a regular talk might be refused if you propose to sing the InnoDB settings, a LT on this topic could be seen as legitimate (but you must demonstrate that you can do it!)
  • Convince me (specifically, as I will be choosing the talks accepted by the committee) that you want to be on stage and have the abilities for the show.
  • Be prepared to show your slides earlier than usual. As the organizer, I need to make sure that you have something meaningful to show before sending you on stage.
  • Be aware of the rules:
    • All slides will be loaded into a single computer, to minimize delays between talks;
    • Someone (probably me) will nag the speakers until they either surrender their slides or escape to Mexico;
    • All speakers will meet 15 minutes before the start, and be given the presentation order. Missing speakers will be replaced by reserve speakers;
    • The speaker will have 5 minutes to deliver the talk.
    • When one minute is left, there will be a light sound to remind of the remaining time.
    • When 10 seconds are left, most likely the audience will start chanting the countdown.
    • When the time is finished, the speaker must leave the podium to the next one.

If you have reached this point, you are ready to submit a proposal!

See also:

Friday, October 11, 2013

Tungsten Replicator Filters: A trove of golden secrets unveiled

Since I joined the company in late 2010, I have known that one of the strong points of Tungsten Replicator is its ability of setting filters. The amazing capabilities offered by Tungsten filters cannot be fully grasped unless we explain how stage replication works.

There are several default stages in the replication stream. Every stage has an extraction task and an apply task. The extraction task will get data from the previous step repository and the apply task will save the data to the next repository, which can be either a temporary storage (memory queue, THL file) or the final destination (slave database server). Consider that the architecture allows developers to add stages, and you will appreciate its full power. For every stage, we can insert one or more filter between the two tasks. This architecture gives us ample freedom to design our pipelines.


Tungsten Replication stages

What’s more interesting, filters in Tungsten can be of two types:

  • built-in filters, written in Java, which require users to compile and build the software before the filter is available;
  • Javascript filters, which can do almost everything the built-in scripts can do, but don’t require recompiling. All you need is deploy the filter in the appropriate folder, and run a tpm command to install or update the replicator.

The Tungsten team has developed a large number of filters, either to achieve general purpose pipeline control (such as renaming objects, excluding or including schemas and tables from replication) or filters providing very specific tasks that were needed to meet customer requirements or to help implementing heterogeneous pipelines (such as converting ENUM and SET columns to strings, dropping UPDATE and DELETE events, normalize DDL, etc). There are 23 built-in and 18 Javascript filters available. They were largely undocumented, or sparsely documented in the code. Now, thanks to my colleague MC Brown, Continuent director of documentation, the treasure is there for the taking.

Tungsten filters are a beautiful feature to explore, but also a powerful tool for users with special needs who want to create their own customized pipeline. A word of warning, though. Filters are powerful and sometimes fun to implement and use, but they don’t come free. There are two main things to keep in mind:

  • Using more than one filter for the same stage may result in performance loss. The amount of performance loss depends on the stage and on your systems resources. For example if your replication weakest point is data transfer across the network, adding two or three filters to this stage (remote-to-thl) will make things worse. If you can apply the filter to the previous or next stages, you may achieve your purpose without many side effects. As always, the advice is: “test, benchmark, repeat.”
  • Filters that alter data (such as exclude schemas or table, drop events) will make the slave unsuitable for promotion. You should always ask yourself if the filter may make the slave a lousy master, and if it does, make sure you have at least another slave that is replicating without filters.

As a final parting thought, be aware that yours truly and the above-mentioned MC Brown will be speaking at Percona Live London 2013, with a full, take-no-prisoners, 6 hours long complete Tungsten Replicator tutorial, where we will cover filters and give some explosive and sensational examples.

Wednesday, October 09, 2013

Speaking at the MySQL NoSQL & Cloud Conference & Expo in Buenos Aires

I am on my way to Argentina, where I will be speaking at the MySQL NoSQL & Cloud Conference & Expo.

I have two talks: one on my pet project MySQL Sandbox and one on replication between MySQL and MongoDB (using another project dear to me, Tungsten Replicator.

It’s my first visit to Argentina and I will try to look around a bit before the conference. And I look forward to see many ex colleagues and well known speakers at the conference. The lineup includes speakers from Percona, EffectiveMySQL, PalominoDB, MariaDB, SkySQL, Tokutek, and OpenStack.

I am looking forward to this trip. My presentation on MongoDB replication is a first for me, and I am always pleased when I can break new ground. I have the feeling that the networking is going to be more exciting than the sessions.

If you are in Buenos Aires, come to the conference and say hi!

Vote on Planet MySQL