MacGyver Development

Wednesday, 7 January 2015

Solve the party planning problem and earn $1 000 000 USD

The party planning problem deals with a village of 200 people where two villagers either are friends or dislike each other. A big party is to be planned and your task is to find the largest set of people in the village to be invited to the party fulfilling the condition that all the invited people must be friends with each other.

This seems like an ordinary puzzle that could take some hours to figure out. However, after giving it some thought, it turns out to be a quite hard problem. In fact it is a version of a well known problem called the “Clique problem”. It is the mother of hard problems and is intricately connected to other hard problems such as prime factorization, the knapsack problem and the travelling salesman. This problem is so hard that the Clay Mathematics Institute has offered a prize of $1 000 000 to anyone who can show that this or similar problems in the same class of problems can be solved in a practical way. Likewise, you win the money for showing that it is impossible to solve these problems without spending an amount of time in comparison to the age of the universe.

Before going into the fascinating details of this problem we should start with discussing what easy and hard problem really are from a computational standpoint. Consider the problem of solving the multiplication of 17 x 13. If you can’t solve this directly in your head you know a method from school on how to multiplicate the digits individually to come to the conclusion that the product is 221. If feels like an easy problem. Consider the reverse, what numbers should you multiply to get 221? This is harder, we probably have to test a lot of numbers to see if they divide 221 without a remainder to finally find that 17 and 13 are the factors of 221.

Now consider the same problem but with the numbers 37975227936943673922808872755445627854565536638199 × 40094690950920881030683735292761468389214899724061. If you’re not Rainman you will probably not do the multiplication in your head but you could do it with a pen and paper and some spare time. A computer would calculate the product in a split second. What about the reverse? Finding what two numbers to multiply to receive the product 1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139 turns out to be extremely hard even with the help of a computer. It is so hard that this is the cornerstone of communication security on the Internet (RSA cryptography).

What is the actual difference? For these kind of problems we have methods also known as algorithms which acts as recipes on how to produce a solution to a specific kind of problem. In the case of multiplication we know from school that we should write the numbers above each other and multiply each digit with all the digits in the other number. The total number of simple operations will in this case be n² if the numbers have n digits. This is a common way to denote how hard a problem is to solve. If the problem has an input of size n, in this case the number of digits in the numbers to multiply, the solution will require us to perform a number of simple operations proportional to some function of n, in this case n². If you remember from school, a polynomial is a function where the variable is raised to different powers such as n¹ + n² + n³ + … n^x. So the complexity of the multiplication problem which is n² is a polynomial.

A problem which can be solved with an algorithm that performs something proportional to a polynomial number of operations in relation to the size of the input is considered an easy problem. There is in fact a name for these kind of problems. They belong to the class P of problems where P means that they can be solved in polynomial time.

Now let’s consider the reverse of our math problem where we tried to find the factors of the product. In this case, since we must test numbers smaller than the product for divisibility, the number of simple operations needed in this naive algorithm will grow exponentially with the number of digits in the product. For example, to find the factors of 221 we have to try whether it is divisible by 2, 3, 4, 5 … and so on. A number of one more digit like 5671 would require 10 times as many tries.

Problems like this where the number of operations needed to find the solution grows exponentially with the size of the input are considered hard. An interesting thing about this hard problem is that if you in some way find a solution (the factors of the number), it is an easy thing to verify that the solution is correct. You would simply multiply the factors and see whether they really produce the product. More formally, the task to verify the solution is in the class P of problems.

Now we’re closing in on the $1 000 000 problem. All these problems, which seems very hard to solve, but once you have a solution it is straight forward to verify that the solution is correct, are called NP problems (nondeterministic polynomial time). This means that we don’t really know the complexity to solve them, but a solution can be verified with an algorithm that belongs to class P (an easy task). To win the money you must show that either P = NP or P ≠ NP. That means, either you must prove that you can solve an NP problem in a fast way or show that this is impossible and there can’t exist any algorithm however smart it is that can solve these problems in a fast way.

Do you see the connection with the party planner problem? It is hard to find a group of people which all are friends among all the presumably invited persons. But once you have found this group, it is easy to verify that all these persons really are friends.

Another way to state this is that all these problems are some sort of search problems. You try to search for factors of number, a subset of party people or for all the other NP problems there are similar searches needed. The thing you win the money for is to determine whether you can solve these type of search problems without actually need to search out all the possibilities in a brute force manner which often means an exponential number of operations to test all combinations.

Let’s take a closer look at the party planning problem, or as we will call it from now, the Clique problem. We simplify our example and assume that there are seven people in the village and they can be represented as a graph. Each villager is a node in the graph and paths between two nodes indicates that the corresponding villagers are friends with eachother. The problem now is to find the largest subgraph where each node is connected to all other nodes in the subgraph. Such a subgraph is called a clique.

The naive algorithm would try all combinations to verify whether there is a clique of size 1, 2, 3 and so on until no such clique can be found. In the example below an exhaustive search is made to find a valid clique of size 4.

See how the search complexity would explode in possible permutations when increasing the number of nodes to 8, 9, 10...?

There are many NP problems which on the surface seems very different. Some famous examples include the travelling salesman problem (TSP) where you search for a tour between a set of cities shorter than a given length. The boolean satisfiability problem (SAT) where you try to find a configuration for a set of boolean variables that makes an expression true. For example the expression (a AND NOT b) can be evaluated to true if a = true and b = false. There are no known algorithms to solve these problems in polynomial time.

Now for some mind boggling facts about these problems. The clique problem together with several other NP problems are part of a special subclass in NP called NP-complete problems. Such problems have the property that all other NP problems can be converted to them in polynomial time. This means that for example the problem of factorizing a large number can in a fast way be remade into a graph clique problem. Solve the clique problem and you have the answer for the factorization.
See what this implies? If you can find a fast way of solving the clique problem or any other NP complete problem you have as a side effect solved all other NP problems since they can trivially (in respect to computational time) be converted to that problem.

If this algorithm was discovered, the digital landscape would be completely changed. Internet security would become obsolete over night and many other applications could be revolutionized with faster algorithms.

The NP equals P problem is probably the most famous open question within the field of computational computer science. Solve it and your name will be engraved in various precious metals.

Footnote:
The numbers for the example of large prime factorization described in this article are in fact taken from another challenge given by the RSA company. They posted publically a set of large semi prime numbers in 1991. A semi prime number is the factor of two primes. Each number had a reward attached so that when someone managed to factor the number they could claim prizes from $1,000 up to $200,000 USD. The numbers ranged from 330 to 2048 binary digits. The reason for this was to survey the academic landscape and find the cutting edge algorithms for prime factorization which their cryptographic product is based on. To current date all challenges up to 768 digits ($50,000) are beaten by various teams around the world. The challenge was discontinued in 2007.

Tuesday, 6 January 2015

Raspberry Pi Weather Station with multiple thermometers

Last Christmas I gave my father in law a weather station to monitor temperature inside their summer house to avoid the pipes to freeze during the winter. It has been working well but has only been measuring the temperature inside the house. During the holiday we added another sensor to be able to also read the temperature outside the house.
See this blog post for the original setup
http://macgyverdev.blogspot.se/2014/01/weather-station-using-raspberry-pi.html

Adding more thermometers is straight forward since the sensor used, a digital DS18B20, uses the OneWire protocol which the Raspberry Pi has drivers for. Adding more sensors doesn't require more ports on the Pi but instead reuses the same three ports for power, ground and data as the first sensor. Each thermometer has a unique hard wired address which is used to communicate over the data wire with the Raspberry Pi driver.

Programmatically, the new sensor is represented as a device folder in /sys/bus/w1/devices. You can simply cat this new file and find the latest temperature reading.

The Raspberry Pi posts the temperature readings to a CouchDB server and a small single page app presents the historical readings with graphs for the current day, week and month.

The original code is available on GitHub at https://github.com/johannoren/WeatherStation.
The adaptations for multiple sensors can be found on branch https://github.com/johannoren/WeatherStation/tree/multiplesensors

Monday, 28 April 2014

Become a better programmer with programming challenges

When I went to university me and some friends spent time on practising algorithm implementations for the national programming contests that were held each year. The concept was (and still is) to solve about ten problems in a couple of hours time using a programming language of choice. The source code is submitted to an automatic judge that compiles the code and feeds the executable with test data and verifies that the program delivers the correct output expected for the input.

After being out in real work I have noticed that I have had good use of these earlier exercises. One of the most important lessons is to know by heart the complexity of different algorithms and when to use a simple brute force method and when a more advanced algorithm must be implemented.

I noticed that the old Online Judge system that we practised on in the early 2000s still is online and has grown to a large site with lots of users solving programming challenges. If you want to recapitulate your algorithm skills this is a fun and very hands on way to do small exercises and get feedback in realtime. Also there is an addictive gamification effect of trying to solve more problems.

Browse the catalogue of problems and find some challenge that feels interesting. The problems are color coded so you know which one you already solved and some stats that gives a hint on how difficult the problems have been for other users to solve.

Once you have submitted your code you get instant feedback on how your code has performed on the judges test cases.

At the moment you may submit solutions in C/C++/C++11 and Java. Create an account at http://uva.onlinejudge.org/. Once you've done that you can follow your progress and your statistics in different ways. Here's my stats http://uhunt.felix-halim.net/id/426271.

I've recently done a couple of problems per night just for the fun of it. It feels exciting to program on a hard problem using just 20 lines of code instead of coding 1000 lines of boiler plate which I guess many of us do at our every day works.

Without trying to indicate that I'm an expert on complex algorithms, here are some things I think are valuable when getting serious about algorithm design and solving harder programming challenges.

Complexity theory

If you didn't take a course on complexity theory or don't remember the basics I think this is a must to have in the toolbox for every programmer. This implies knowing which problems are inherently hard. By hard we dont mean hard to implement a solution for, but hard in the sense that the resources (CPU time or memory) scales bad with larger and larger sets of input for the problem.

The standard way of measuring the complexity of an algorithm is to resolve the algorithms Ordo or Big-O function. The formal definition is that this Big-O function O() satifies

f(x) = O(g(x)) when x ➝ ∞ if and only if |f(x)| < M |g(x)| for all x > x₀

In English this means that there is a constant M which when multiplied with g(x) always will be larger (make an upper bound) on the function f(x) which is the time or memory usage of the algorithm for the input size x of the problem when x is sufficiently large (larger than some certain x₀). These values always considers the worst case input for the algorithm. This might need an example.

Consider the problem of sorting an array of n values. The basic bubble sort algorithm is probably the easiest algorithm to implement and uses two nested for loops to swap the values into a sorted order. Since the arrays have the size n, which is the size of the input of the problem, the Big-O function for this will be O(n²). Why? The number of operations needed to sort the array will be proportional to two nested sweeps over the array of size n, which makes M×n×n operations. So g(n) = n². Here the actual swap operation is the constant M number of machine operations.

Another problem, consider finding a value in a sorted tree (or in a sorted array for that matter via binary search). In this algorithm, each inspection of a node in the tree is proportional to one operation. Each inspection will cut the size of the tree in half so only one of the halfs will be needed to be examined in the next operation. This means that the number of operations needed to find the value will be less than the number of values in the tree. So we have an algorithm that is faster than the linear g(n) = n. In fact this class of problems are solved in logarithmic time proportional to the size of the input, or O(log(n)).

So going back to programming challenges, by knowing a lot of algorithms you might figure out which one will suit the particular problem. First of all, always find the bounds of the input data for your problem. What are the most extreme inputs we might expect to handle? Then, assuming the worst case scenario, what is the most easily implemented algorithm we can get away with within the requirements on time and space utilization?

For example, to sort an array of at most 10 000 000 values, bubble sort would require something proportional to n×n = 100 000 000 000 000 operations. Even with multi cores, this will not be fast. So by knowing that for example the heap sort algorithm is O(n log n) we know that this algorithm will sort the array proportional to 10 000 000 × 7 = 70 000 000. Definitively something that can be accomplished within a fraction of a second.

But assume that we know that the values in the array are all in the range of 1 to 1000 000. This extra clue can give us the hint to consider bucket sort which has larger memory requirements than heap sort but can perform the sort in linear time proportional to the input size, O(n).

Know your algorithms

The above example of sorting shows that it is essential to know a lot of algorithms. By "knowing", you don't need to know the implementation of them. You can always search the web for actual implementations, pseudo code or even find them implemented in some library. The important part is to know that they exist and for what particular problems they are useful and what requirements they put on input data, time and memory usage so you can make a decision on whether they are applicable for your problem.

There are specific algorithms targeting very specific problems such as Dijkstra's algorithm that solves the single-source shortest path problem for a graph with non-negative edge path costs. This is a rather common problem within the class of graph problems so it can be handy to know by heart, but maybe more interesting is to analyze what makes Dijkstra's algorithm successful. It is a typical example of Dynamic Programming (DP) which is a general approach of solving many hard problems by breaking them into smaller and simpler subproblems and solving the larger problem by combining solutions of smaller problems. Other general ideas that can be useful in many different contexts is divide and conquer algorithms, greedy algorithms and backtracking.

When considering the programming challenges in the Online Judge system, often the hard and most interesting part is to figure out what class the problem belongs to. Can the problem description be translated into a graph problem for example? In that case which graph algorithm will solve the problem?

Without any intention to be complete, here's some algorithms that can be valueable to know about and put in your toolbox together with some suggestions on problems to apply them on.

Sorting

When considering which algorithm to choose, think about what requirements you have and the expected input. For example, some algorithms are stable, where stable means that if two objects are considered equal (depending on the comparison function used when sorting), the relative order of them will be preserved. For example the quick sort algorithm is much more complicated to implement if it must behave in a stable manner. Furthermore, some algorithms have very nice average Big-O requirements but may deteriorate to O(n²) if the values to sort already are sorted. Ordo funtioncs are always given for worst case scenarios.

Merge sort is a popular O(n log n) sorting algorithm implemented in Perls sorting library. Timsort, O(n log n) is based on merge sort but can run in linear time if the input is already sorted. Timsort is deployed in JDK 7 and Python libraries. Heap sort has even less memory requirements than merge sort and runs in the same time asymptotically.

If you have the possibility to sort in parallel there are implementations of for example merge sort than can give even better performance.

UVA problems that could be relevant to practise on: 299 - Train Swapping, 10252 - Common Permutation.

Graph problems

The most important thing to consider when attacking graph problems is how to traverse the graph. Depth First Search (DFS) is easily implemented with recursion but can have problems with non termination if the graph's size is similar to the internet and if recursion is made in languages like Java or C, the stack will overflow in a certain amount of iterations (stackframes). Other languages like Scala can avoid new stack frames if the code is written tail recursively which internally is unwrapped as a while loop. In older languages, you must know how to convert your recursive solution to an imperative loop solution when search depths become substantial.

Adaptations to DFS can be for example iterative deepening, popular in implementations of game trees in for example Chess engines. This means that the search is performed to deeper and deeper depths in iterations making it possible to abort if for example a certain time constraint is broken. Useful when the branching factor of the searched tree is large. Alpha-beta pruning is a clever way to reduce the search space by limiting the branching factors in DFS in game trees.

Typical uses of DFS are finding a solution out of a maze, planarity testing, finding connected components of a graph, deciding biconnectivity of a graph and topological sorting of a directed graph.

Breadth First Search (BFS) is based on a queue data structure and is easily implemented without recursion. Some typical uses of BFS is to find all nodes within a connected component, finding the shortest path between two nodes of an unweighted graph and to find the maximum flow in a flow network.

Some practise problems on general graph and tree traversals are 167 - The Sultan's Successors, 115 - Climbing Trees, 297 - Quadtrees.

Combining DFS or BFS with ideas like greedy selection and DP and you have some of the most famous algorithms below.

Dijsktra's algorithm - Find the shortest path from a specific node to all other nodes in a graph with non-negative edge costs. The original algorithm from 1959 runs in O(|V|²) where |V| is the number of vertices in the graph. An improvement uses a Fibonacci heap as priority queue which runs in O(|E| + |V| log |V|) where |E| is the number of edges. However, implementing a Fibonacci heap is work on its own so only use it if necessary.

Bellman–Ford algorithm - Similar to Dijkstra's algorithm, Bellman-Ford finds shortests paths in a graph but can also handle negative edge weights (as long as there are no negative cycles since that means there is no shortest path).

Prim's and Kruskal's algorithms can be used to find a minimum spanning tree of a graph. That means, a tree that includes all vertices that are connected within the graph. These algorithms have similar time complexity as Dijkstra since the O(|V|²) can be improved to O(|E| + |V| log |V|) with a Fibonacci heap. Prim's algorithm is a good example of a greedy algorithm.

Floyd-Warshall's algorithm finds all shortest paths costs between all nodes in a graph with positive or negative edge weights (no negative cycles though). The simplest implementation can be extended to also keep a backtracking table so it is possible to not only find the path costs but also reconstruct the actual shortest path between all nodes. FW is a typical example of Dynamic Programming. FW runs in O(|V|³).
A good practice problem is 104 - Arbitrage.

Sequence and string problems

There are a lot of problems involving finding longest sequences or subsequences in different lists. These lists can be arrays of numbers or characters written as strings. Often a combination of sorting and some specific algorithm is best medicine.

The Longest Increasing Subsequence algorithm (LIS) will in the list 0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15 figure out that the longest sequence in ascending order must be 0, 2, 6, 9, 13, 15. Note there can of course be other equally long sequences. LIS runs in O(n log n) since its implementation is based on binary search.

The Longest common substring problem means to find out that "XYZ" is the longest common substring in the three strings "XYXYZ", "YXYZX" and "XYZYX". There are two types of algorithms for this, one builds a suffix tree and one is based on dynamic programming.

The problem of finding the minimum number of edit operations needed to turn one string into another is addressed by the String-to-string correction problem. A side effect of the algorithm for this is that by storing the deltas you can for example store many very large quite similar strings by only keeping one copy of the first string and then the small deltas of the edits needed to turn it into the other strings. An application would be to store for example DNA sequences or similar in this way. The Levenshtein distance is the number of changes needed to turn a string into another and can solve the String-to-string problem via a DP approach.

Practise problems: 111 - History Grading.

Number theory

Math problems can sometimes feel academical but there are important real life applications considering for example prime numbers and cryptography or how many ways something can be done using combinatorics.

Problems concerning prime numbers are common. To verify whether a certain number is prime (primality testing), different algorithms can be used. For moderate bounds on input, simply test for divisibility up to √n. Once n is large you might need faster ways to test for primality. One of the fastest algorithms which runs in polynomial time is the AKS test which can be implemented in O(log(n)⁶).

A tougher problem is to factor a number into its prime factors. The most difficult numbers are semiprimes which are the product of two primes. No polynomial algorithm is known to do this which is the base of RSA cryptography. If you know more about the properties of the composite number there can be shortcuts to factor the number. Numbers on the form r^e±s can be efficiently factored using the Special Number Field Sieve algorithm if r and s are relatively small. From this the General Number Field Sieve is derived which is the most efficient classical algorithm known for factoring integers > 100 digits.

There are problems where you need to find the greatest common divisor (gcd) of two numbers. This number can be found using the Euclidean algorithm which can be implemented as below where % is Javas syntax for modulus.

int gcd(int a, int b) {
   if (b != 0) {
      return gcd(b, a % b);
   }
   return Math.abs(a);
}

Once we have gcd, we can easily calculate the least common multiple (lcm) of two numbers as

Another problem that you will find in some of the UVA challenges includes raising large powers. If you are to calculate a^b and b is large the naive solution would include b number of multiplications of a. A much faster way is to use repeated squaring which means that you square the results to make the number of multiplications logarithmic in relation to b. Some pseudo code for this would be

/**
 * Calculates n to the p power, where 
 * p is a positive number.
 */
LargeInt power(LargeInt n, LargeInt p) {
   if (p == 0) return 1
   if (p == 1) return n
   if (isOdd(p)) {
      return n * power( n * n, (p - 1) / 2 )
   } else {
      return power( n * n, p / 2 )
   }
}

There are some problems on calculating Fibonacci sequences. If you wind up with Time Limit Exceeded (TLE) from the judge on these and you think you have made a really fast algorithm avoiding recursion and no double calculations, remember that factorials can be expressed in a closed form as well where F(n) is the n'th number in the sequence.

Problems: 136 - Ugly Numbers, 160 - Factors and Factorials, 583 - Prime Factors,

Combinatorics

Variations on combinatorics is a common theme. Assume you have a box of n items, in how many unique ways can you pick k items? 0 <= k <= n.
If you remember your discreet mathematics this is solved using binomial coefficients (Pascal's triangle). The academic version is often described as a divisions of factorials.

This is not a practical implementations however. The formula can be rewritten as a product of terms better suited for an engineering implemention.

Example problems: 10219 - Find the ways !, 11401 - Triangle Counting

Probability theory

Knowing some basic probability theory is not only good for programming challenges. Besides the basic definitions and axioms (Kolgomorov) there are lots of interesting topics such as stochastic processes and variables, probability distributions, Bayes theorem and more.

Practise problem: 10491 - Cows and Cars

Dynamic Programming

Dynamic programming (DP) is a methodology for attacking complex problems by splitting the problem into simpler subproblems which solutions can be reused to solve the original problem. The problem must have the property of overlapping subproblems. This means that the subproblems are occuring several times in the problem and can be reused in the DP approach. A simple example of this would be a function f(n) to calculate Fibonacci numbers. Since f(n) = f(n-1) + f(n-2), a DP approach would store each f(x) to reuse previously calculated valued. The stored values are called "memo-ized".

In general, DP is a technique to speed up solutions of problems where you make brute force searches and cache intermediate results.

Let's examine a more complex problem where DP actually makes great difference.

"Suppose you have x dollars and want to figure out in how many unique ways you get change for this in the coins 1c, 5c, 10c, 25c and 50c."

The brute force approach would be to nest five for loops and iterate over the five different coins and count the number of valid combinations. A very simple algorithm but it would not scale well with larger amounts of dollars or if we had more types of coins increasing the number of nested for loops.

Let's start thinking from a dynamic programming perspective. Can we make the problem simpler? Let's assume we have solved the same problem (in some at this time unknown way) for 4 currencies. So we know how many combinations you could get x dollars in 1c, 5c, 10c and 25c. Could this help us? Yes, then we could deduce that the total number of combinations would be the sum of 0, 1, 2 ... n number of 50c coins where n is the maximal number of 50c coins you could get out of x dollars. Each such number of combinations could be calculated by the already known function since it handles the other four coins.

Naturally, we can use the same approach to solve the problem for 4, 3 and 2 coins. The case of 1 coin is so simple that we can implement it without relying on any other assumed function.

Here's a recursive implementation of this idea.

 private int combinations(final int[] change, final int maxCoins,
  final int amount) {

   /*
    * Base case 1. If amount = 0, there is one combination -> no change.
    */
   if (amount == 0) {
    return 1;
   }

   /*
    * Base case 2. If negative amount, no solutions
    */
   if (amount < 0) {
    return 0;
   }

   /*
    * Base case 3. No change to work with -> no solutions
    */
   if (maxCoins <= 0) {
    return 0;
   }

   /*
    * Here goes the recursion, sum all combinations using one coin less
    * from the list of coins plus the number of combinations if we remove
    * one of the largest coins from the total amount.
    */
   final int c = combinations(change, maxCoins - 1, amount) + combinations(change, maxCoins, amount - change[maxCoins - 1]);

   return c;
  }

Call it with the number of available coins and the number of coins in the array to use together with the total amount of dollars (times 100 to get int values for cents).

int combs = combinations([1, 5, 10, 25, 50], 5, 500000);

This will work, but it will not be efficient. For each invocation that does not end in a base case, two recursive calls will be performed and the same subproblem will be recalculated over and over again.

The good thing is that we have divided the problem into specific overlapping subproblems. Now we can add the idea of dynamic programming. Add a cache that stores previous calculations as a base case to make the recursive solution only calculate subproblems that not have been encountered before.

    private int combinations(final int[] change, final int maxCoins,
            final int amount, final int lookupTable[][]) {

        /*
         * Base case 1. If amount = 0, there is one combination -> no change.
         */
        if (amount == 0) {
            return 1;
        }

        /*
         * Base case 2. If negative amount, no solutions
         */
        if (amount < 0) {
            return 0;
        }

        /*
         * Base case 3. No change to work with -> no solutions
         */
        if (maxCoins <= 0) {
            return 0;
        }

        /*
         * Base case 4. This is already calculated, so DP finds result in cache
         */
        if (lookupTable[maxCoins][amount] != -1) {
            return lookupTable[maxCoins][amount];
        }

        /*
         * Here goes the recursion, sum all combinations using one coin less
         * from the list of coins plus the number of combinations if we remove
         * one of the largest coins from the total amount.
         */
        final int c = combinations(change, maxCoins - 1, amount,
                lookupTable)
                + combinations(change, maxCoins, amount
                        - change[maxCoins - 1], lookupTable);

        lookupTable[maxCoins][amount] = c;

        return c;
    }

This will perform efficiently and is it is easy to understand the logic. Unfortunately a problem with recursive solutions in imperative languages like Java is that deep recursions will run out of stack space. So to make this solution even more generic we should rewrite it using loops instead of in the more aestethic functional style.

    private int[][] combinations(final int[] change, final int maxCoins,
            final int amount) {

        final int[][] lookupTable = new int[amount + 1][maxCoins + 1];

        for (int i = 0; i <= amount; i++) {
            for (int j = 0; j <= maxCoins; j++) {
                if (i == 0) {
                    lookupTable[i][j] = 1;
                } else if (j == 0) {
                    lookupTable[i][j] = 0;
                } else if (change[j - 1] > i) {
                    lookupTable[i][j] = lookupTable[i][j - 1];
                } else {
                    lookupTable[i][j] = lookupTable[i - change[j - 1]][j]
                            + lookupTable[i][j - 1];
                }
            }
        }

        return lookupTable;
    }

This function retrieves the all the memoized combinations and we find the solution to our problem as

        final int[] change = { 1, 5, 10, 25, 50 };
        final int[][] combs = combinations(change, change.length, maxDollars);
        final int   answer = combs[amountToChange][5];

In the UVA problem set you will often need to think from a DP perspective to create solutions that are fast enough for the input data and the time constraints.

Some good problems to practise on: 103 - Stacking Boxes, 108 - Maximum Sum

The particular coin change problem described above is available in several versions on UVA. Have a try at problem 147 - Dollars.

Divide and Conquer

Divide and conquer algorithms are based on multi branched recursion. It breaks down a problem into several sub problems and recurse on them. These can thereby be divided into subproblems until they are small enough to be easily solved. Then the solutions are used to generate the answer to the complex problem. Quicksort, merge sort, discrete Fourier transform (FFT), top down parsers used in syntactic analysis are D&C algorithms.

Another example that can be useful to know in the UVA problem challenges is that there is an algorithm called Karatsuba which multiplies two large integers (with thousand or millions of digits) fast. It runs in O(3n^log₂3) which is notably faster than the O(n²) method we learned in school. Karatsuba is a D&C algorithm. Binary search could be called a simple form of D&C.

Suitable practise problems: 10341 - Solve It

Greedy algorithms

A greedy algorithm makes local optimal choices at each stage during the algorithm in hope of producing a global optimum. Many problems will fail to do so with a greedy approach since the complete search space is not visited. But for certain problems greedy selection works well and can therefore create fast solutions. In other cases where the optimal solution is not possible to calculate in an efficient way, a greedy solution might come up with a decent approximation or bounded result.

Examples of greedy algorithms are the construction of a Huffman tree during Huffman coding or Kruskal's algorithm for finding a minimum spanning tree of a graph.

Practise problem: 11389 - The Bus Driver Problem

Backtracking

Backtracking is a technique that finds one or all solutions to a problem by trying different candidates and abandon those branches that breaks the constraints of the problem which means that the current path cannot lead to a valid solution of the problem.

An example where backtracking would be suitable is a solver for a Sudoku puzzle. By doing a depth first search and inserting new numbers in empty positions and for each insertion checking whether the new number breaks any constraints of the rules of Sudoku, the board can be iteratively filled with valid numbers. As soon as a number violates the rules and all numbers have been tried in the current position, the algorithm must backtrack to the previous position and try another number.

Examples where backtracking is suitable are puzzles and combinatorial optimization problems such as the knapsack problem.

A good practise problem is the classic Eight Queens puzzle where you should position eight queens on a chessboard without any queen threatening another. Here's a few modifications on that theme on the UVA site 11085 - Back to the 8-Queens, 167 - The Sultan's Successors.

UVA Online Judge

The online judge system at http://uva.onlinejudge.org/ accepts code in C/C++/C++11/Java. Some hints for beginners might be to start with an easy problem to get to know the judge. Good choices might be problem 10018 - Reverse and Add or 11172 - Relational Operator.

Important, read the input and output constraints carefully. For example, if the input consists of integers and are between 0 and 4,294,967,295 remember that a signed 32 bit integer will overflow so you should use a 64-bit integer or an unsigned int depending on the language you code in.

Output format matters, so be careful with trailing spaces and linefeeds. Sometimes you simply must try with or without an ending newline since the problem description won't tell which one is correct unfortunately.

Create a good template which you can copy when starting on a new problem. If you code in Java you will probably have everything you need in the standard library such as BigIntegers and much more. If you code in C/C++ it can be good to add some routines for handling arbitrary large numbers to your template.

If you get stuck on a problem try to create test cases for all the border cases allowed in the input. Try the worst possible input and also the smallest possible input to see that you get reasonable output.

Many basic problems can be solved with simple ad hoc brute force solutions but the more interesting problems with greater time complexities will perform very badly with naive solutions. The problems are often devised so that brute force solutions will not be able to perform within the time limits. However, once you've created a fancy solution based on DP or some other approach it is not always self evident that the algorithm is correct. In those cases it can still be valuable to run the outputs from your algorithm against a quickly coded brute force solution to verify that they produce the same output.

If you still is stuck and can't figure out why the online judge gives your solution "Wrong Answer" you can check the forums where many helpful programmers gives hints or test data to test your code against.

Good luck!

Edit 2014-05-01:
I got feedback on whether I could recommend any books for those interested to dig deeper. I got two books which I bought ten years ago which I still think should be relevant.

Programming Challenges: The Programming Contest Training Manual by Skiena and Revilla is exactly what it sounds like. A walktrough of different types of programming challenges and techniques for solving them.

Algorithms by Sedgewick and Wayne is more a general CS book and a good read.

Monday, 17 March 2014

New Eclipse project connected to Git repository

You have a new Eclipse project to work on and you want to version control it on a Git repository on another machine. Here's how you do it.

1. On the server you wish to have your Git repository remotely stored, setup an empty repository. In our examples here the repository and project will be called UVaOnline

johan@htpc ~/gitrepos $ pwd

/home/johan/gitrepos

johan@htpc ~/gitrepos $ mkdir UVaOnline.git

johan@htpc ~/gitrepos $ cd UVaOnline.git/

johan@htpc ~/gitrepos/UVaOnline.git $ git --bare init --shared=all

2. Add a file to this empty project if you wish to make sure there is something to check out from Eclipse.

johan@htpc ~/gitrepos $ cd /tmp

johan@htpc /tmp $ git clone /home/johan/gitrepos/UVaOnline.git/

johan@htpc /tmp $ cd UVaOnline/

johan@htpc /tmp $ touch FOO

johan@htpc /tmp $ git add FOO

johan@htpc /tmp $ git commit

johan@htpc /tmp $ git push origin master

3. In Eclipse, add the Git Repository Exploring perspective via the menu Window -> Open Perspective.

4. Click the "Clone a Git repository and add the clone to this view" icon in the perspective.

5. In the first step of this wizard, enter the connection details for the Git repository. In my case I use my SSH user to log on the server.

6. If your project was emtpy, you only have the master branch to work with. Click next.

7. Choose where you wish to have your local Git repository on the machine Eclipse is installed on. I selected the "Import all existing projects after clone finishes".

8. Now a new repository should be available in the Git repository Explorer.

9. Try to fetch the repository if you want to see the refs configurred.

10. You see the connection URL again, next.

11. Here you could add or delete refs pointers. For our purposes, this looks alright.

12. To import the project into the Java/Java EE perspectives, click the "Import Projects" option in the menu.

13. Since this project is a bare bones project with no .project file with Java facets, we must select to import the project as a "Import as general project".

14. Probably the default name is good enough.

15. Now you could start adding file to your project. I made my project a Maven project here and added some source code. Once you're ready to commit to the remote repository, right-click and choose Team -> Synchronise Workspace.

16. In the next view, select the files you wish to commit and add them to Git index first.

17. Then right click again and commit.

18. If you do not click "Commit and push" the files are only added to your local git repository on the same machine. That can be good if you want to pile up a bunch of local commits before pushing to the remote repository. When you want to push all local commits, in the Java perspective right click and go via Remote to Push.

19. If it is the first time, you might need to add a spec for this. If you work on master which you do if you have followed the tutorial, simple add "refs/heads/master" as source and destination ref and click "Add spec".

20. New spec added.

21. Once finished, the commits you had locally are pushed to the remote repo.

Thats it. I have some more stuff on working with Git in Eclipse in this blogpost about Github cloning.

For more advanced Git usage, you might need the console. That's no problem, simply make sure to refresh the Eclipse project after working in the console since Eclipse must refresh the .git folder etcetera. See this Git cheat sheat for my favourites.

Tuesday, 18 February 2014

Porthello Legends - Android Reversi Port

I wrote a clone of the game Reversi/Othello for Android three years ago and it actually made some progress on Google Play (Market as it was called then). It was downloaded nearly 1000 times and a lot of players seems to have enjoyed it because I could see in the highscore lists and the Analytics statistics that they spent quite some time on trying to beat the AI to gain access to even harder levels.

Unfortunately, the name Othello Legends was a bad choice since it was trademarked. That meant that after 6 months the game was suspended from Google Play. Since I put quite a lot of effort into the game it is a shame that it disappeared and I have now done a refactoring to the new name Porthello Legends :-).

My Android tablet died last summer and all I got to test with is an pretty old HTC Desire which the game seems to work fine on. It would be interesting to get feedback on how the game performs on newer phones/tablets as well.

Check out the game at Google Play, search for "Porthello Legends" or go to https://play.google.com/store/apps/details?id=se.noren.android.porthello&hl=en.

The old blogpost on the inital release can be found here http://macgyverdev.blogspot.se/2011/12/othello-legends-10-in-android-market.html.

Tuesday, 11 February 2014

Apache web server as reverse proxy and virtual host for Tomcat 7

I'm running a couple of sites on a Linux server on my local network. I only have one public ip address but I want to be able to host multiple sites on my servers. So this can be done by using a reverse proxy in front of the application servers which delegates traffic depending on which host name the end user was trying to request.

I've implemented this by using the reverse proxy feature with virtual hosts in the Apache HTTP web server.
The result this exercise is trying to achieve is that external traffic from internet requesting pages from the site www.all-about-units.com which is DNS mapped to my public ip address will be routed to the Apache web server on the local network via a port forwarding rule in the router on the local network. The Apache web server will look into its rules for virtual hosts and see that the user requested the domain www.all-about-units.com and therefore delegate the traffic to a separate Apache Tomcat 7 servlet container running on the local network to serve the request. The traffic is then forwarded back through the web server to the end user.

Setup Apache HTTP server as reverse proxy

First of all the Apache HTTP web server must be installed. I had mine installed since long ago but if I remember correctly there's not much than doing this is you have a Debian/Ubuntu/Mint like Linux server

sudo apt-get install apache2

Now, to enable the reverse proxy feature of the Apache server

/ $ sudo a2enmod proxy_http

Next, we must create a configuration for our new site as a virtual host. Go to the directory of available sites

/etc/apache2/conf.d $ cd /etc/apache2/sites-available/

Copy the default configuration to a setup for your site

/etc/apache2/sites-available $ sudo cp default all-about-units

Lets edit the configuration file. The first line tells the Apache server to listen on port 80, the standard HTTP port. Next the names of the virtual host are given. So traffic requesting another domain name or the ip address directly will not be handled by this virtual host configuration.

Next look att the ProxyPass and ProxyPassReverse, these tell Apache where to forward the request that has matched this virtual host rule. In this case to the Tomcat server on the same machine (127.0.0.1 is the loopback address of localhost) but on port 8080. The slash before the address means that all requests should be forwarded to the Tomcat server. If for example you had ProxyPass /foo/bar/ http://127.0.0.1:8080/ then the request www.all-about-units.com/foo/bar/page.html would be redirected to 127.0.0.1:8080/page.html

/etc/apache2/sites-available $ sudo vim all-about-units



<VirtualHost *:80>

  ServerName all-about-units.com

  ServerAlias www.all-about-units.com




  ProxyRequests Off

  ProxyPreserveHost On




  <Proxy *>

    Order deny,allow

    Allow from all

  </Proxy>




  ProxyPass / http://127.0.0.1:8080/

  ProxyPassReverse / http://127.0.0.1:8080/

</VirtualHost>

Enable and disable the site

To tell the Apache HTTP web server to enable this configuration, run the a2ensite command on the name of the new configuration

/etc/apache2/sites-available $ sudo a2ensite all-about-units


Enabling site all-about-units.

To activate the new configuration, you need to run:

  service apache2 reload

Reload Apache



/etc/apache2/sites-available $ sudo service apache2 reload

 * Reloading web server config

I had some issues so I restarted Apache the hard way also and then it works.

/etc/apache2/sites-available $  sudo /etc/init.d/apache2 restart



 * Restarting web server apache2                                                          apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1 for ServerName

 ... waiting apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1 for ServerName

That's it!

Verify in the logs of the application server that the request are directed correct and your home.
Next step might be to enable the mod cache feature of Apache to let the web server handle caching of all static material served by the application server like css, images and javascript. More on that in another post.

If you for some reason want to take the site of the web for some time you can instead of stopping the application server disable the site in the web server virtual host forwarding by using the command

sudo a2dissite all-about-units

Tuesday, 4 February 2014

How to make Jenkins install the packaged war in a Tomcat 7

This is a sequel to http://macgyverdev.blogspot.se/2014/01/setup-jenkins-project-from-git.html where I set up Jenkins to build my latest project. The next step is to install the war-file built by Jenkins on my Tomcat 7 server.

First of all we must make sure Tomcat accepts installations by scripting instead of the human html GUI version. In the tomcat-users.xml file, located in something similar to this:

/var/lib/tomcat7/conf/tomcat-users.xml

Make sure that the admin user, whatever his name is, has the role manager-script. Here's the important part of my file:

<tomcat-users>

</tomcat-users>

Now, in Jenkins you must install the Deploy Plugin. You find it in the plugins section.

Your Jenkins job is here assumed to be configured to package a war file. You can verify this by inspecting that a build generates a war file like this:

~/.jenkins/jobs/UnitConversion/workspace/target/Unitconversion.war

For this Jenkins job, go to the settings page and add a new post-build action. Choose the "Deploy war/ear to a container".

In this view make sure the path to the packaged war is relative to the workspace area of the job. Most probably it will be "target/yourapp.war".

Now, rebuild your job and when inspecting the console logs you should see that the war/ear is installed!

Deploying /home/johan/.jenkins/jobs/UnitConversion/workspace/target/Unitconversion.war to container Tomcat 7.x Remote
  Redeploying [/home/johan/.jenkins/jobs/UnitConversion/workspace/target/Unitconversion.war]
  Undeploying [/home/johan/.jenkins/jobs/UnitConversion/workspace/target/Unitconversion.war]
  Deploying [/home/johan/.jenkins/jobs/UnitConversion/workspace/target/Unitconversion.war]
Finished: SUCCESS

This way it takes two clicks from editing the code in Eclipse to update the site All About Units in production, first commit and push to Git, second start the Jenkins job.

One note, if you have deployed your application with context root "/" in Tomcat as I have done, see http://macgyverdev.blogspot.se/2014/02/how-to-change-context-root-to-in-tomcat.html, you can not have Context path = "/" in the configuration above. Instead, keep multiple context roots in Tomcat, and as in my example add the context root with a longer name, otherwise the deploy plugin cannot undeploy the ROOT web application. If you according to my blog post on it has manipulated the server.xml a reinstallation of "/webappname" will also reinstall the ROOT application since they point to the same directory on disk.