Optimization series: steps 0 and 1

I will be doing a series of posts on optimization, namely software optimization for speed, in the coming weeks and months. Optimization is by far my favourite thing in programming. To me, there’s nothing quite like taking code that works and making it code that works faster. I don’t get to do it as often as I’d like, and the same will probably go for you, whether you want to or not, but I will share what knowledge I’ve garnered over the years doing this.

 

Step 0: Pre-preparation

 

(Of course this is 0-indexed…) The very first step, and please pay attention, is that YOU MUST NOT, UNDER ANY CIRCUMSTANCE, BEGIN WITH THE OPTIMIZATION. That is: FIX THE BUGS FIRST. This is critical, optimization is very often a feature, or at the very least a defect correction, but more often than not an improvement. If you speed up code that doesn’t work, it will only crash faster. There is such a thing as preemptive optimization which can lead to more trouble than worth it down the road, and it’s a documented pitfall. Consider yourself warned.

(It goes without saying that standard programming practices must be applied, such as having backups and using a form of version control for your code. You must set that up if it’s not, for your own sanity as well as for the needs of the work ahead.)

 

Step 1: Preparation

 

With a stable codebase and no (known) bugs, we can begin. Optimization (and this is one of the reasons that makes it so attractive to me) is like experimental science. In science experiments, there is a very important notion that applies here. Scientists call it control. For example, for pharmaceuticals, there is usually a “control group” which is given a pill that does nothing, a placebo, along with other groups which will be given the actual drugs being tested. In the end, every subject took “a pill”, and that’s the scientist’s basis for analysis and comparison. This is also taught in high school science: don’t contaminate your samples, which is another way to say “don’t introduce unknowns.” It’s the same thing with software, but we’ll usually talk about benchmarking.

What you need to do is have all the conditions required in order to run your benchmark, in such a way that you can get back to it easily. That could mean working with backup databases, test files, virtual machines with snapshots, etc. If your optimization requires something more “exotic”, say simulating a faint wifi signal, near capacity memory or CPU usage, etc, you need to be able to bring up those same conditions, so that when you test your changes, you’re actually only testing your changes. In all cases, a reference base is needed, an initial state of input. So your first step (well, after fixing the bugs), is to BE ABLE TO EASILY GO BACK TO YOUR BENCHMARK FOR REFERENCE AND COMPARISON. This also implies being able to compare the end result, not just for speed but for accuracy. Because optimization is about finding a quicker route from point A to point B, not ending up at point C.

Got that? Don’t come back until you have that.

Categories: Optimization, Programming

Rounding in .NET: A fair warning

2014-04-01 2 comments

Everybody uses the same rounding they learned in school. Only Siths deal in absolutes, as they say, but this is really really basic: .0, .1, .2, .3, .4 round down, and .5, .6, .7, .8, .9 round up. We’ve all learned this in school.

.NET Framework designers have taken it upon themselves to make fools of us all by defaulting the rounding algorithm (used by Math.Round) to something called “Banker’s rounding.” It behaves the same, except for .5, which will round to the nearest even integer. 2.5 rounds down to 2, 3.5 rounds up to 4. This was done so as to distribute .5 evenly up or down. Or a better explanation might be…

Read more…

Categories: Basics, Programming Tags:

Counting objects

This is a variation on the classic C++ approach, adapted to C#. While .NET has its Garbage Collector which takes care of all objects you don’t need, at some point one might need to track how many instances of an object were created, or are active.

public class SomeClass
{
    public static int ObjectsCreatedCount = 0;
    public static int ObjectsActiveCount = 0;
    public SomeClass()
    {
        ObjectsCreatedCount++;
        ObjectsActiveCount++;
    }
    ~SomeClass()
    {
        ObjectsActiveCount--;
    }
}

That way, you can use SomeClass.ObjectsCreatedCount to know how many objects were created, and SomeClass.ObjectsActiveCount to know how many objects are still around at any point.

Another thing of note: C# classes do have destructors, contrary to popular belief. It has no delete keyword because of the Garbage Collector, and there’s little management to do for memory, but there’s still a possibility of using a destructor.

Categories: Basics, Programming Tags:

Worst way to do a sum

I’m dealing with some seriously bad code. It’s sort of like this.

Suppose you need to add two integers in C:

    int sum(int x, int y)
    {
        return x+y;
    }

Let’s not even consider nastiness like int limits, just a plain sum that works. That above is the sane way. That’s how you work. You do a single, tiny function that does a single job well.

This is the legacy I’m dealing with:

    //Declared globally
    int sum1 = /* a value */;
    int sum2 = /* another value */;
    int sum; /* the result */
    sumx(); /* execution */
    //...
    void sumx()
    {
        sum = sum1+sum2;
    }

Technically, it works. Same result. What’s the difference? Maintainability.

Categories: Basics, Programming Tags:

The one rule for software maintenance

Code which cannot be maintained might as well just not work or even exist.

– Thierry Eude, Ph. D., Ing. jr., Université Laval

Cast away

My C++ class is done, and I feel like I learned some tiny things here and there, mostly refresher stuff, but not a lot of meat to sink my teeth into.

So, to learn more, I turned to Stack Overflow, and while wanting to be helpful, got served. It’s alright, it’s cool, I learned I had still lots to learn. And that’s why I will talk about casting.

Usually, in a function, you should get your types right, and avoid wasteful casting. That’s true. Compilers throw warnings at you if you don’t respect types, and that’s usually where you need to be explicit about what you mean. C has the classic cast, what most call the C-style cast:

    int i = 35;
    float f = (float)i;

C++ offers a lot more tools. Most of the time, we’re casting to another type to adapt to a situation, a function, etc. C++ allows us to be more explicit, which is good, because you’re making your meaning clear to future maintainers of your code (and that can be yourself). Never neglect the power of explicit code. Don’t just cast to make it work (or not send a warning), cast and be clear about your intentions.

These are called casting operators, and they go as follows.

First, const_cast.

const_cast exists to deal with whether a variable or parameter is constant or not. Sometimes, you need to get rid of “const” for just a bit. Casting it C-style works, but the C++ way is crystal:

void nonconstfunc(int *i);

void constfunc(const int *i)
{
    nonconstfunc(i); //error, this will not compile
    int* i2 = const_cast<int*>(i); // but convert it to a non-const int pointer
    nonconstfunc(i2); //and use the non-const copy instead!
}

Next is reinterpret_cast.

reinterpret_cast does a very specific job: it can cast a pointer to an integral type and back. So if you have a pointer and you want to print its address, say, you can use this:

int main(int argc, char *argv[])
{
	long adr = reinterpret_cast<long>(argv);
	cout << adr;
	return 0;
}

It’s a very specific use. Changing the value and casting it back to pointer is definitely not recommended. You can see this being used to hash an address and put it in a hash table, for example.

The next two are opposite sides of the same coin: dynamic_cast and static_cast. I don’t fully grok them yet, so I will not elaborate on them in this post. Suffice it to say that they complement each other. dynamic_cast runs real-time checks, while static_cast does not, which makes it faster. Both can cast up or down a chain of inheritance. So I will grok and get back to this.

Categories: Learning Tags:

Simple C# Serialization

2012-06-18 1 comment

(Kudos if you noticed the alliterated title)
(Edit: this is an old article I never published. Until now! I hope it serves as a nice introduction to beginners).

Serialization is a thing I had (until recently) left untouched. For too long. Well, no more. Here are the basics.

What is serialization?
Simply put, it’s taking an object’s value(s) and writing them to a file (or a buffer, but I’m not covering that here) for transmission or storage.

What is deserialization?
It’s the reverse, building an object from a file (or buffer).

Why serialize?
Serialization can be used for various reasons, such as transmitting data to another computer or storing it. Storage could potentially be done with a database, but even something portable like SQLite is another thing to maintain, whereas for minimal needs, serialization offers quick and dirty clean storage for later use. For example, I’m using it for storing a few sets of parameters for batch processing.

Here’s how. I will be using the simplest form of XML serialization (that is, writing to, and naturally reading from XML), not Soap, but with the very vanilla System.Xml.Serialization namespace.

First, add System.Xml to your project’s references.

Now, we’ll design the basic class we want to serialize. First, the class without the serialization attributes:

public class ClientInfo
{
    public string ClientName { get; set; }
    public int ClientId { get; set; }
    public DateTime ProcessTime { get; set; }

    public ClientInfo (string clientName, int clientId)
    {
        ClientName = clientName;
        ClientId = clientId;
    }
}

To serialize, first we’ll need the proper namespace, so add:

using System.Xml.Serialization;

in your using block at the top.

Next, we’ll decide what we want to serialize. Let’s say ClientName and ClientId, but ProcessTime shouldn’t be serialized at all (for some reason. Hey, I just needed the example).

So, we’ll need to tell the compiler that the class is serializable, simply by adding the attribute above the class declaration:

[Serializable]
public class ClientInfo
{/...

By default, all fields become serializable. Let’s specify the field to ignore appropriately:

    public string ClientName { get; set; }
    public int ClientId { get; set; }
    [XmlIgnore] //Tells C# to forget about that value for serialization
    public DateTime ProcessTime { get; set; }

Serialization also requires a default constructor. And the nice thing is that it does not have to be public, so your class logic can remain the same from outside (i.e. no default instantiation if you don’t want it):

    private ClientInfo()
    { }

The total result:

using System.Xml.Serialization;

[Serializable]
public class ClientInfo
{
    public string ClientName { get; set; }
    public int ClientId { get; set; }
    [XmlIgnore]
    public DateTime ProcessTime { get; set; }

    public ClientInfo (string clientName, int clientId)
    {
        ClientName = clientName;
        ClientId = clientId;
    }

    private ClientInfo()
    { }
}

And we’re done for that part! Now, suppose you want to serialize that into an Xml file. Use a function like so:

public static bool SaveConfig(ClientInfo clientInfo, string filePath)
{
    bool success = false;
    using (FileStream flStream = new FileStream(filePath, FileMode.Create, FileAccess.Write))
    {
        try
        {
            XmlSerializer xmlSerializer = new XmlSerializer(typeof(List));
            xmlSerializer.Serialize(flStream, clientInfo);
            success = true;
        }
        //do add proper handling of possible stream exceptions here
        finally
        {
            flStream.Close();
        }
    }
    return success;
}

And reading, essentially taking the XML class and making it into an object is just as easy, it’s called
deserialization:

public ClientInfo LoadConfig(string filePath)
{
    if (!File.Exists(filePath))
        return null; //Or ClientInfo.Default if you made one
    
    ClientInfo clientInfo = null;
    using (FileStream flStream = new FileStream(destFileName, FileMode.Open, FileAccess.Read))
    {
        try
        {
            XmlSerializer xmlSerializer = new XmlSerializer(typeof(ClientInfo));
            clientInfo = xmlSerializer.Deserialize(flStream) as ClientInfo;
        }
        //do add proper handling of possible stream exceptions here
        finally
        {
            flStream.Close();
        }
    }
    return clientInfo;
}

This is just the tip of the iceberg. You can serialize/deserialize sub-objects, collections, etc. This is very useful for reading and writing back. And of course if you need to serialize/deserialize something large, there are better ways, like binary serialization!

Categories: Basics, Programming Tags: