Archive

Archive for May, 2009

Nice Compression: Huffman Coding

May 27th, 2009

In this post we are going to see a brief resume of one of the most famous coding algorithms, Huffman coding. It’s purpose is to find a way to encode information in less space than the standard one, that is, to compress information. It was developed by David Huffman and published in 1952.

We are going to see examples using plain text encoded in ASCII. Later on the post we will see how this algorithm can be applied to other sorts of data.

Basic idea

If we have some text, usually we are storing 8 bits of information for each of its characters, which allows us to represent 256 different characters. Even in the subset of those 256 characters actually being used, the frequency at which they appear is (usually) very different. That is the fact that Huffman coding tries to exploit. If we could encode frequent characters with fewer bits than characters less frequent, the total length of the coded text will usually be smaller. (Yes, if you want to encode a text where every character has the same frequency, Huffman coding cannot help you. Sorry about that).

So, we need a non-ambiguous, variable-length way to encode characters, where characters get codes with length inversely proportional to their frequency. By non-ambiguous I mean that we should have a way to know where one character’s encoding ends, and so the next one begins. For example, if ‘A’ was coded as 1 and ‘B’ as 11, and we received 11, we could not know whether it was encoding “AA” or “B”‘.

David Huffman approached this problem using a binary tree, where at each node 0 represents a child and 1 the other one, and the characters we want to encode are represented by the leaves. The most frequent characters would be, in this tree, nearer from the root, and the most rarely found characters would be in deeper levels.

Tree construction

So, how can we construct this kind of tree in such a way that it minimizes the resulting encoding length? First of all, we need to count the frequency of every character. We are going to see the set of characters as elements, each of them with an associated weight, being the weight the number of times that it appears in the text. Now, we just select the two elements with the lowest weight, make that two elements siblings of a new parent node, remove those two elements from the set and insert the new sub-tree constructed to the set, being its weight the sum of its two children. We repeat the process until we have only one element, that is, the Huffman tree, which minimizes the encoding length.

Huffman tree
Huffman tree

Using it

We want to use this tree in two ways: for encoding text and for decoding it:

  • Encoding: We have to find the path, for the character we want to encode at each moment, from the tree’s root to the leave where it is. For every node from the root to the character, insert into the coded stream the bit linked to the decision you must take to reach your target character.
  • Decoding: At first we start from the tree root. For each bit we read, we move to its related node, until we reach a leaf. In that moment we get a decoded character, and start again from the tree root.

A thing to take into account is that the decoder should have access to the Huffman tree, so we must send/store it with the encoded data. There is also the option of agreeing beforehand in some Huffman tree so we don’t have to waste space with it, but this can lead to a worse compression ratio.

We should also bear in mind that depending on the text, specially if it is short or has not much difference in frequency among characters, the fact of storing the tree can be counterproductive, but it is usually not.

Evolution and variations

A number of variations with higher performance have been built over this algorithm, but I consider that the original is much simpler and nicer.

Some of these variations consider not only characters but groups of them, or analyze if there are different frequencies of the characters in different parts of a file and then using different trees for the different parts.

Finally, just say that I have just used text examples on this post, but of course the algorithm works well on any kind of data with different values appearing at different frequencies. The fact of some codecs as MP3 or JPEG, among others, using Huffman coding as part of their encoding/decoding algorithms corroborates that and shows David Huffman’s work usefulness.

Algorithms , ,

10 reasons you should know C++

May 20th, 2009

First of all, as I know that this can be a sensible topic for someones and that it can turn to a religious war, I want to remark that I’m only saying that, for a programmer, to know C++ is a good thing to do, and I am NOT saying neither of this:

  • That C++ is the best language around, as that concept does not exist.
  • That you should use it every time you have to program something. The language to choose for a project depends, of course, on the project itself.
  • That it is the only language you should know. It is not.

Here comes the list. There are some reasons more focused at the knowledge you can get, and others more focused at work and employment issues.

1. C++ has influences in many other current languages

When other currently most used languages, such as Java or C# appeared, C++ had already been here for a while, and, as one of their goals was to substitute C++, is no surprise that they have been highly influenced by it. So, learning C++ you will learn an important subset of many other languages, making you easier to learn them later if you ever need to.

2. Huge amounts of documentation for it

As, probably, the most used language in the last 20-25 years, there are tons of resources oriented at it. Books, forums, blogs, you will find virtually any kind of help you may dream of.

3. Many code samples are coded in C++

Whenever someone, including book authors, need to write some code and the audience is not expected to share any other language, he will usually implement that code in C++. So, as in many cases you are actually expected to understand it, it would be good for you to really understand it well, in order to learn more from the source you are reading/listening.

4. It will provide you a better understanding of other systems

Some people criticizes C++ as difficult for explicitly dealing with low-level things as pointers or explicit memory management. While it’s true that in some circumstances you would get no benefit in dealing with them, and it’s then more a source of bugs than anything else, it’s also true that having a sincere, deep understanding of them will make you a better programmer, and you will have a wider comprehension of the system, even when programming on some language that abstracts those concepts.

5. Templates & generic programming

C++ provides you the possibility of programming functions not taking into account the types of some variables or some of the arguments. This opened the possibility of truly algorithmics, and has been used in many other languages since then.

6. Interfacing languages

There are lots of different libraries, and many of them are in C++. If you program in any other language, chances are that some day you will have to interface some library, and it will probably be in C++, so you’ll have to interface it, and knowing C++ will save you time.

7. Efficient machine code produced by C++ compilers

C++ has had efficiency as an objective from the start, and, while the gap has diminished in recent years, C++ compilers are still the ones producing most efficient machine code. You may find yourself in the need of writing some part of code where you must control things that higher level languages do not allow you. Knowing C++ could help you then.

8. There will always (or, at least, for a very long time) be a niche for C++

It’s true that the C++ ratio usage has diminished in the last, say, 10 years. That’s mostly because it was used for applications that did not really needed a lot of control and performance. But almost all of those applications are no longer written in C++ any more. Most of the projects being currently developed in C++ do actually have time or space constraints, and they use a C++ for a reason. Systems programming, simulations, game programming and some others are niches that are not going to switch to another language for a long while.

9. C++ projects are more interesting than the average

As pointed out by Alastair Rankine in a post which I agree with, C++ projects are over-represented in the set of interesting projects and under-represented in the rest of profitable projects. All other things being equal, working on an interesting project instead of non-interesting one is a great change.

10. You will hardly be seen as a hacker if you don’t know C++

This is very important if looking for a new job. Companies (intelligent ones, at least), when looking for programmers, prefer hackers, people with passion for programming. This, again, happens most on interesting projects, which are usually harder too. Knowing C++ you will not be immediately seen as a hacker, but the inverse applies here. There are few hackers that don’t know C++, and your reputation as a programmer will be lower if you don’t know it.

General , ,

Turning your loops faster, II

May 13th, 2009

This post is a continuation to “Turning your loops faster, I”. If you haven’t read it, you may want to before reading this one. Here we continue reviewing some loop optimizing techniques.

Loop unrolling

We saw on the previous post that there is an overhead we pay on each loop iteration. A simple idea, so, is to do in a single new, bigger iteration, the work that was done in many former iterations. This way, there are less iterations to do and less overhead per element.

Due to the fact that the most used current processors can execute instructions out-of-order, a bigger benefit can be gained from loop unrolling as long as there are no data dependencies among different iterations, as we can also avoid the pipeline from flushing in some points, and so not losing cycles. As many other optimitzations, this can be already performed by your compiler, based on some heuristics, but maybe their results are not the optimal ones. You may want to try this specially on loops with many iterations and few code instructions.

// Original loop
sum = 0;
for(i=0; i < N; i++)
    sum += a[i];

// 2-Unrolled loop
sum = 0;
for(i=0; i < N; i+=2)
{
    sum += a[i];
    sum += a[i+1];
}

You have to be sure that the number of iterations is multiple of your unrolling degree; otherwise you have to do the remaining operations, non unrolled, at the end of the unrolled loop.

This was just a simple example, in this case the compiler would probably be already doing unrolling, and maybe in a greater unrolling degree.

Using accumulators

Even if we have data dependencies, sometimes we can slightly modify the algorithm to use different variables in a loop and so avoiding the processor from having to wait for the previous result when it really doesn't needs it.

// 2-Unrolled loop with accumulator
sum = sum2 = 0;
for(i=0; i < N; i+=2)
{
    sum += a[i];
    sum2 += a[i+1];
}
sum += sum2;

This way we do not have data dependencies inside the loop .Compilers usually do not apply this optimization.

Loop collapsing

Sometimes we have nested loops that can be joined into a single loop, usually with pointer arithmetic. This optimization is usually not done by compilers, and it can also enable other optimizations.

// We have a bidimensional array, and we want to set all of its elements to 0
int a[N][M];

// Usual way of setting all elements to 0
for (i = 0; i < N; i++)
    for (j = 0; j < M; j++)
        a[i][j] = 0;

// Collapsing loops
int *p = &a[0][0];
for (i = 0; i < N*M; i++)
    *p++ = 0;

This way we have fewer loop overhead per element.

Loop fusion

This technique can be applied when we have two loops that iterate over the same set (or, at least, they have a common subset). Knowing that, we can merge both loops and have half of the previous loop overhead. Care must be taken to avoid data dependency problems. This is usually not done by the compiler either.

// We have 2 arrays, a and b, of size N. We want to increment all the elements of both arrays.
for(i=0; i < N; i++)
    a[i]++;
for(i=0; i < N; i++)
    b[i]++;

// Fusing the loops
for(i=0; i < N; i++)
{
    a[i]++;
    b[i]++;
}

Reversing order in nested loops

If we have nested loops, and use them as index to access memory, for example, to access a bidimensional array, we must take advantage of the memory hierarchy. Sometimes, just reversing the nested loops order, a big speedup can be achieved. It's important to know your language, in particular how your language stores multidimensional arrays. C stores them by rows, but there are some languages that store them in the inverse way.

I have done some tests on my machine. Here are the two codes:

// Not cache-friendly
for(int i=0; i < N; i++)
	for(int j=0; j < N; j++)
		a[j][i] = 1;

// Cache friendly
for(int i=0; i < N; i++)
	for(int j=0; j < N; j++)
		a[i][j] = 1;

First I tested with an array of 10000 x 10000 int elements. The speed-up obtained by the cache friendly version was 5.46x.

Then I tested with an array of 2¹³ int elements each dimension. In this case, the obtained speed-up was 15.14x. This is caused because the cache friendly code continues exploiting spatial locality while the cache unaware code not only does not exploit it but it's using only a subset of cache lines, because every iteration in the inner loop is accessing memory addresses with the same lower bits, which are used to know where does that memory go in the cache.

Condition evaluation order

When you have more than one condition ORed or ANDed, the compiler will short-circuit it. That is, if the conditions are ORed and the first one is true, it will not check the others, as the result will be true anyway. If the conditions are ANDed and the first one is false, it won't check the others either, as the result will always be false. So, if you have one condition much slower than the others, put it the last one, as you will avoid evaluating it some times. If you know that a condition is going to cause more short-circuits than the other ones, put it the first one so you will avoid evaluating the other conditions more frequently.

Other techniques

These are the most common basic techniques. You can usually achieve greater speed-ups from here with parallellization, using techniques as vectorization or threading, which we will talk about in future posts.

Architecture, Optimization , , ,

Turning your loops faster, I

May 6th, 2009

It’s very usual in our programs that most of its time is spent in loops. So, after having profiled and measured our application, if it is the case, we may want optimize some of our loops. Maybe they are already optimized, by us or by the compiler, but it’s important to know some techniques that may turn our loops faster.

Branch overhead

This is an important concept in order to understand many of the loop optimizations.

Current processors are pipelined. Not a little bit. A lot. An instruction is at many stages before ending its execution. Most of the instructions can be well adapted to a pipelined design without major efforts, but that’s not the case with branching instructions. Because of their nature, they (can) change the next instruction to be executed, from the implicit (the following) one to an explicit one, often depending on a value produced by an instruction which has not yet finished its execution when the branching instruction is fetched. So we have a problem, we have to wait, flush some (many) stages and lose some cycles, something that we don’t like, because it does our programs slower.

In fact, the reality is not so hard. Processor companies have, of course, seen this bottleneck in their products, and have had lots of engineers trying to solve them. The result is that different branching predictors have been designed, some of them are really effective, and what was a huge performance penalty is now a smaller one. Although it is now a smaller problem than it used to be, we still have to be aware of it, specially in loops, because loops imply, inherently, branching.

Inlining

Inlining means inserting the body of a function where it is called instead of just calling it. This can be done both by the compiler or by the programmer, and it’s not just a loop optimization, it can be applied anywhere else, but it’s in loops where we’ll get the bigger benefits of inlining our smaller functions.

Beware that:

  • Compilers, depending of the set options, already try to guess which functions would be good for inlining. You can want to use some tools to look which functions are actually inlined. Try to know what your compiler’s policy is in this aspect.
  • Function inlining can actually cause you performance loses in some cases, because it can imply a larger number of instructions and a worse explotation of the instruction cache.
  • If the compiler does not inlines some function, you try to manually inline it, and you in fact have a performance gain, think before leaving it there if the speed-up is worth having some code duplicated, which can lead to maintenability problems.

Another good thing about inlining is that it can let, in some situations, your compiler to do other kind of optimizations which it wasn’t able to do without inlining, because of a better knowledge of the code context.

Code Hoisting or Loop-invariant code motion

They key idea here is to move some code out of the loop, in order to avoid calculating it at every iteration, when its result will be the same during all the loop execution. Imagine the following code:

for (i = 0; i < n; i++)
{
  if(x)
    a[i] = y;
  else
    a[i] = y*y;
}

Here, a couple of optimizations can be made:

  • "x" does not change inside the loop, so we don't have to check that condition, and execute branching instructions, at every single iteration.
  • "y*y" does not change inside the loop either. So we can have it precomputed outside the loop and assign it.
if(x)
{
    for (i = 0; i < n; i++)
        a[i] = y;
}
else
{
    y2 = y*y;
    for (i = 0; i < n; i++)
        a[i] = y2;
}

The program's semantic is the same but the performance is greater in the second code. Again, be warned that your compiler might be already doing this optimization for you.

More techniques

There are many other techniques, and we will see some of them in the next post, so stay tuned!

EDIT: Second part already avalaible

Architecture, Optimization , , ,