MacGyver Development

Monday, 28 April 2014

Become a better programmer with programming challenges

When I went to university me and some friends spent time on practising algorithm implementations for the national programming contests that were held each year. The concept was (and still is) to solve about ten problems in a couple of hours time using a programming language of choice. The source code is submitted to an automatic judge that compiles the code and feeds the executable with test data and verifies that the program delivers the correct output expected for the input.

After being out in real work I have noticed that I have had good use of these earlier exercises. One of the most important lessons is to know by heart the complexity of different algorithms and when to use a simple brute force method and when a more advanced algorithm must be implemented.

I noticed that the old Online Judge system that we practised on in the early 2000s still is online and has grown to a large site with lots of users solving programming challenges. If you want to recapitulate your algorithm skills this is a fun and very hands on way to do small exercises and get feedback in realtime. Also there is an addictive gamification effect of trying to solve more problems.

Browse the catalogue of problems and find some challenge that feels interesting. The problems are color coded so you know which one you already solved and some stats that gives a hint on how difficult the problems have been for other users to solve.

Once you have submitted your code you get instant feedback on how your code has performed on the judges test cases.

At the moment you may submit solutions in C/C++/C++11 and Java. Create an account at http://uva.onlinejudge.org/. Once you've done that you can follow your progress and your statistics in different ways. Here's my stats http://uhunt.felix-halim.net/id/426271.

I've recently done a couple of problems per night just for the fun of it. It feels exciting to program on a hard problem using just 20 lines of code instead of coding 1000 lines of boiler plate which I guess many of us do at our every day works.

Without trying to indicate that I'm an expert on complex algorithms, here are some things I think are valuable when getting serious about algorithm design and solving harder programming challenges.

Complexity theory

If you didn't take a course on complexity theory or don't remember the basics I think this is a must to have in the toolbox for every programmer. This implies knowing which problems are inherently hard. By hard we dont mean hard to implement a solution for, but hard in the sense that the resources (CPU time or memory) scales bad with larger and larger sets of input for the problem.

The standard way of measuring the complexity of an algorithm is to resolve the algorithms Ordo or Big-O function. The formal definition is that this Big-O function O() satifies

f(x) = O(g(x)) when x ➝ ∞ if and only if |f(x)| < M |g(x)| for all x > x₀

In English this means that there is a constant M which when multiplied with g(x) always will be larger (make an upper bound) on the function f(x) which is the time or memory usage of the algorithm for the input size x of the problem when x is sufficiently large (larger than some certain x₀). These values always considers the worst case input for the algorithm. This might need an example.

Consider the problem of sorting an array of n values. The basic bubble sort algorithm is probably the easiest algorithm to implement and uses two nested for loops to swap the values into a sorted order. Since the arrays have the size n, which is the size of the input of the problem, the Big-O function for this will be O(n²). Why? The number of operations needed to sort the array will be proportional to two nested sweeps over the array of size n, which makes M×n×n operations. So g(n) = n². Here the actual swap operation is the constant M number of machine operations.

Another problem, consider finding a value in a sorted tree (or in a sorted array for that matter via binary search). In this algorithm, each inspection of a node in the tree is proportional to one operation. Each inspection will cut the size of the tree in half so only one of the halfs will be needed to be examined in the next operation. This means that the number of operations needed to find the value will be less than the number of values in the tree. So we have an algorithm that is faster than the linear g(n) = n. In fact this class of problems are solved in logarithmic time proportional to the size of the input, or O(log(n)).

So going back to programming challenges, by knowing a lot of algorithms you might figure out which one will suit the particular problem. First of all, always find the bounds of the input data for your problem. What are the most extreme inputs we might expect to handle? Then, assuming the worst case scenario, what is the most easily implemented algorithm we can get away with within the requirements on time and space utilization?

For example, to sort an array of at most 10 000 000 values, bubble sort would require something proportional to n×n = 100 000 000 000 000 operations. Even with multi cores, this will not be fast. So by knowing that for example the heap sort algorithm is O(n log n) we know that this algorithm will sort the array proportional to 10 000 000 × 7 = 70 000 000. Definitively something that can be accomplished within a fraction of a second.

But assume that we know that the values in the array are all in the range of 1 to 1000 000. This extra clue can give us the hint to consider bucket sort which has larger memory requirements than heap sort but can perform the sort in linear time proportional to the input size, O(n).

Know your algorithms

The above example of sorting shows that it is essential to know a lot of algorithms. By "knowing", you don't need to know the implementation of them. You can always search the web for actual implementations, pseudo code or even find them implemented in some library. The important part is to know that they exist and for what particular problems they are useful and what requirements they put on input data, time and memory usage so you can make a decision on whether they are applicable for your problem.

There are specific algorithms targeting very specific problems such as Dijkstra's algorithm that solves the single-source shortest path problem for a graph with non-negative edge path costs. This is a rather common problem within the class of graph problems so it can be handy to know by heart, but maybe more interesting is to analyze what makes Dijkstra's algorithm successful. It is a typical example of Dynamic Programming (DP) which is a general approach of solving many hard problems by breaking them into smaller and simpler subproblems and solving the larger problem by combining solutions of smaller problems. Other general ideas that can be useful in many different contexts is divide and conquer algorithms, greedy algorithms and backtracking.

When considering the programming challenges in the Online Judge system, often the hard and most interesting part is to figure out what class the problem belongs to. Can the problem description be translated into a graph problem for example? In that case which graph algorithm will solve the problem?

Without any intention to be complete, here's some algorithms that can be valueable to know about and put in your toolbox together with some suggestions on problems to apply them on.

Sorting

When considering which algorithm to choose, think about what requirements you have and the expected input. For example, some algorithms are stable, where stable means that if two objects are considered equal (depending on the comparison function used when sorting), the relative order of them will be preserved. For example the quick sort algorithm is much more complicated to implement if it must behave in a stable manner. Furthermore, some algorithms have very nice average Big-O requirements but may deteriorate to O(n²) if the values to sort already are sorted. Ordo funtioncs are always given for worst case scenarios.

Merge sort is a popular O(n log n) sorting algorithm implemented in Perls sorting library. Timsort, O(n log n) is based on merge sort but can run in linear time if the input is already sorted. Timsort is deployed in JDK 7 and Python libraries. Heap sort has even less memory requirements than merge sort and runs in the same time asymptotically.

If you have the possibility to sort in parallel there are implementations of for example merge sort than can give even better performance.

UVA problems that could be relevant to practise on: 299 - Train Swapping, 10252 - Common Permutation.

Graph problems

The most important thing to consider when attacking graph problems is how to traverse the graph. Depth First Search (DFS) is easily implemented with recursion but can have problems with non termination if the graph's size is similar to the internet and if recursion is made in languages like Java or C, the stack will overflow in a certain amount of iterations (stackframes). Other languages like Scala can avoid new stack frames if the code is written tail recursively which internally is unwrapped as a while loop. In older languages, you must know how to convert your recursive solution to an imperative loop solution when search depths become substantial.

Adaptations to DFS can be for example iterative deepening, popular in implementations of game trees in for example Chess engines. This means that the search is performed to deeper and deeper depths in iterations making it possible to abort if for example a certain time constraint is broken. Useful when the branching factor of the searched tree is large. Alpha-beta pruning is a clever way to reduce the search space by limiting the branching factors in DFS in game trees.

Typical uses of DFS are finding a solution out of a maze, planarity testing, finding connected components of a graph, deciding biconnectivity of a graph and topological sorting of a directed graph.

Breadth First Search (BFS) is based on a queue data structure and is easily implemented without recursion. Some typical uses of BFS is to find all nodes within a connected component, finding the shortest path between two nodes of an unweighted graph and to find the maximum flow in a flow network.

Some practise problems on general graph and tree traversals are 167 - The Sultan's Successors, 115 - Climbing Trees, 297 - Quadtrees.

Combining DFS or BFS with ideas like greedy selection and DP and you have some of the most famous algorithms below.

Dijsktra's algorithm - Find the shortest path from a specific node to all other nodes in a graph with non-negative edge costs. The original algorithm from 1959 runs in O(|V|²) where |V| is the number of vertices in the graph. An improvement uses a Fibonacci heap as priority queue which runs in O(|E| + |V| log |V|) where |E| is the number of edges. However, implementing a Fibonacci heap is work on its own so only use it if necessary.

Bellman–Ford algorithm - Similar to Dijkstra's algorithm, Bellman-Ford finds shortests paths in a graph but can also handle negative edge weights (as long as there are no negative cycles since that means there is no shortest path).

Prim's and Kruskal's algorithms can be used to find a minimum spanning tree of a graph. That means, a tree that includes all vertices that are connected within the graph. These algorithms have similar time complexity as Dijkstra since the O(|V|²) can be improved to O(|E| + |V| log |V|) with a Fibonacci heap. Prim's algorithm is a good example of a greedy algorithm.

Floyd-Warshall's algorithm finds all shortest paths costs between all nodes in a graph with positive or negative edge weights (no negative cycles though). The simplest implementation can be extended to also keep a backtracking table so it is possible to not only find the path costs but also reconstruct the actual shortest path between all nodes. FW is a typical example of Dynamic Programming. FW runs in O(|V|³).
A good practice problem is 104 - Arbitrage.

Sequence and string problems

There are a lot of problems involving finding longest sequences or subsequences in different lists. These lists can be arrays of numbers or characters written as strings. Often a combination of sorting and some specific algorithm is best medicine.

The Longest Increasing Subsequence algorithm (LIS) will in the list 0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15 figure out that the longest sequence in ascending order must be 0, 2, 6, 9, 13, 15. Note there can of course be other equally long sequences. LIS runs in O(n log n) since its implementation is based on binary search.

The Longest common substring problem means to find out that "XYZ" is the longest common substring in the three strings "XYXYZ", "YXYZX" and "XYZYX". There are two types of algorithms for this, one builds a suffix tree and one is based on dynamic programming.

The problem of finding the minimum number of edit operations needed to turn one string into another is addressed by the String-to-string correction problem. A side effect of the algorithm for this is that by storing the deltas you can for example store many very large quite similar strings by only keeping one copy of the first string and then the small deltas of the edits needed to turn it into the other strings. An application would be to store for example DNA sequences or similar in this way. The Levenshtein distance is the number of changes needed to turn a string into another and can solve the String-to-string problem via a DP approach.

Practise problems: 111 - History Grading.

Number theory

Math problems can sometimes feel academical but there are important real life applications considering for example prime numbers and cryptography or how many ways something can be done using combinatorics.

Problems concerning prime numbers are common. To verify whether a certain number is prime (primality testing), different algorithms can be used. For moderate bounds on input, simply test for divisibility up to √n. Once n is large you might need faster ways to test for primality. One of the fastest algorithms which runs in polynomial time is the AKS test which can be implemented in O(log(n)⁶).

A tougher problem is to factor a number into its prime factors. The most difficult numbers are semiprimes which are the product of two primes. No polynomial algorithm is known to do this which is the base of RSA cryptography. If you know more about the properties of the composite number there can be shortcuts to factor the number. Numbers on the form r^e±s can be efficiently factored using the Special Number Field Sieve algorithm if r and s are relatively small. From this the General Number Field Sieve is derived which is the most efficient classical algorithm known for factoring integers > 100 digits.

There are problems where you need to find the greatest common divisor (gcd) of two numbers. This number can be found using the Euclidean algorithm which can be implemented as below where % is Javas syntax for modulus.

int gcd(int a, int b) {
   if (b != 0) {
      return gcd(b, a % b);
   }
   return Math.abs(a);
}

Once we have gcd, we can easily calculate the least common multiple (lcm) of two numbers as

Another problem that you will find in some of the UVA challenges includes raising large powers. If you are to calculate a^b and b is large the naive solution would include b number of multiplications of a. A much faster way is to use repeated squaring which means that you square the results to make the number of multiplications logarithmic in relation to b. Some pseudo code for this would be

/**
 * Calculates n to the p power, where 
 * p is a positive number.
 */
LargeInt power(LargeInt n, LargeInt p) {
   if (p == 0) return 1
   if (p == 1) return n
   if (isOdd(p)) {
      return n * power( n * n, (p - 1) / 2 )
   } else {
      return power( n * n, p / 2 )
   }
}

There are some problems on calculating Fibonacci sequences. If you wind up with Time Limit Exceeded (TLE) from the judge on these and you think you have made a really fast algorithm avoiding recursion and no double calculations, remember that factorials can be expressed in a closed form as well where F(n) is the n'th number in the sequence.

Problems: 136 - Ugly Numbers, 160 - Factors and Factorials, 583 - Prime Factors,

Combinatorics

Variations on combinatorics is a common theme. Assume you have a box of n items, in how many unique ways can you pick k items? 0 <= k <= n.
If you remember your discreet mathematics this is solved using binomial coefficients (Pascal's triangle). The academic version is often described as a divisions of factorials.

This is not a practical implementations however. The formula can be rewritten as a product of terms better suited for an engineering implemention.

Example problems: 10219 - Find the ways !, 11401 - Triangle Counting

Probability theory

Knowing some basic probability theory is not only good for programming challenges. Besides the basic definitions and axioms (Kolgomorov) there are lots of interesting topics such as stochastic processes and variables, probability distributions, Bayes theorem and more.

Practise problem: 10491 - Cows and Cars

Dynamic Programming

Dynamic programming (DP) is a methodology for attacking complex problems by splitting the problem into simpler subproblems which solutions can be reused to solve the original problem. The problem must have the property of overlapping subproblems. This means that the subproblems are occuring several times in the problem and can be reused in the DP approach. A simple example of this would be a function f(n) to calculate Fibonacci numbers. Since f(n) = f(n-1) + f(n-2), a DP approach would store each f(x) to reuse previously calculated valued. The stored values are called "memo-ized".

In general, DP is a technique to speed up solutions of problems where you make brute force searches and cache intermediate results.

Let's examine a more complex problem where DP actually makes great difference.

"Suppose you have x dollars and want to figure out in how many unique ways you get change for this in the coins 1c, 5c, 10c, 25c and 50c."

The brute force approach would be to nest five for loops and iterate over the five different coins and count the number of valid combinations. A very simple algorithm but it would not scale well with larger amounts of dollars or if we had more types of coins increasing the number of nested for loops.

Let's start thinking from a dynamic programming perspective. Can we make the problem simpler? Let's assume we have solved the same problem (in some at this time unknown way) for 4 currencies. So we know how many combinations you could get x dollars in 1c, 5c, 10c and 25c. Could this help us? Yes, then we could deduce that the total number of combinations would be the sum of 0, 1, 2 ... n number of 50c coins where n is the maximal number of 50c coins you could get out of x dollars. Each such number of combinations could be calculated by the already known function since it handles the other four coins.

Naturally, we can use the same approach to solve the problem for 4, 3 and 2 coins. The case of 1 coin is so simple that we can implement it without relying on any other assumed function.

Here's a recursive implementation of this idea.

 private int combinations(final int[] change, final int maxCoins,
  final int amount) {

   /*
    * Base case 1. If amount = 0, there is one combination -> no change.
    */
   if (amount == 0) {
    return 1;
   }

   /*
    * Base case 2. If negative amount, no solutions
    */
   if (amount < 0) {
    return 0;
   }

   /*
    * Base case 3. No change to work with -> no solutions
    */
   if (maxCoins <= 0) {
    return 0;
   }

   /*
    * Here goes the recursion, sum all combinations using one coin less
    * from the list of coins plus the number of combinations if we remove
    * one of the largest coins from the total amount.
    */
   final int c = combinations(change, maxCoins - 1, amount) + combinations(change, maxCoins, amount - change[maxCoins - 1]);

   return c;
  }

Call it with the number of available coins and the number of coins in the array to use together with the total amount of dollars (times 100 to get int values for cents).

int combs = combinations([1, 5, 10, 25, 50], 5, 500000);

This will work, but it will not be efficient. For each invocation that does not end in a base case, two recursive calls will be performed and the same subproblem will be recalculated over and over again.

The good thing is that we have divided the problem into specific overlapping subproblems. Now we can add the idea of dynamic programming. Add a cache that stores previous calculations as a base case to make the recursive solution only calculate subproblems that not have been encountered before.

    private int combinations(final int[] change, final int maxCoins,
            final int amount, final int lookupTable[][]) {

        /*
         * Base case 1. If amount = 0, there is one combination -> no change.
         */
        if (amount == 0) {
            return 1;
        }

        /*
         * Base case 2. If negative amount, no solutions
         */
        if (amount < 0) {
            return 0;
        }

        /*
         * Base case 3. No change to work with -> no solutions
         */
        if (maxCoins <= 0) {
            return 0;
        }

        /*
         * Base case 4. This is already calculated, so DP finds result in cache
         */
        if (lookupTable[maxCoins][amount] != -1) {
            return lookupTable[maxCoins][amount];
        }

        /*
         * Here goes the recursion, sum all combinations using one coin less
         * from the list of coins plus the number of combinations if we remove
         * one of the largest coins from the total amount.
         */
        final int c = combinations(change, maxCoins - 1, amount,
                lookupTable)
                + combinations(change, maxCoins, amount
                        - change[maxCoins - 1], lookupTable);

        lookupTable[maxCoins][amount] = c;

        return c;
    }

This will perform efficiently and is it is easy to understand the logic. Unfortunately a problem with recursive solutions in imperative languages like Java is that deep recursions will run out of stack space. So to make this solution even more generic we should rewrite it using loops instead of in the more aestethic functional style.

    private int[][] combinations(final int[] change, final int maxCoins,
            final int amount) {

        final int[][] lookupTable = new int[amount + 1][maxCoins + 1];

        for (int i = 0; i <= amount; i++) {
            for (int j = 0; j <= maxCoins; j++) {
                if (i == 0) {
                    lookupTable[i][j] = 1;
                } else if (j == 0) {
                    lookupTable[i][j] = 0;
                } else if (change[j - 1] > i) {
                    lookupTable[i][j] = lookupTable[i][j - 1];
                } else {
                    lookupTable[i][j] = lookupTable[i - change[j - 1]][j]
                            + lookupTable[i][j - 1];
                }
            }
        }

        return lookupTable;
    }

This function retrieves the all the memoized combinations and we find the solution to our problem as

        final int[] change = { 1, 5, 10, 25, 50 };
        final int[][] combs = combinations(change, change.length, maxDollars);
        final int   answer = combs[amountToChange][5];

In the UVA problem set you will often need to think from a DP perspective to create solutions that are fast enough for the input data and the time constraints.

Some good problems to practise on: 103 - Stacking Boxes, 108 - Maximum Sum

The particular coin change problem described above is available in several versions on UVA. Have a try at problem 147 - Dollars.

Divide and Conquer

Divide and conquer algorithms are based on multi branched recursion. It breaks down a problem into several sub problems and recurse on them. These can thereby be divided into subproblems until they are small enough to be easily solved. Then the solutions are used to generate the answer to the complex problem. Quicksort, merge sort, discrete Fourier transform (FFT), top down parsers used in syntactic analysis are D&C algorithms.

Another example that can be useful to know in the UVA problem challenges is that there is an algorithm called Karatsuba which multiplies two large integers (with thousand or millions of digits) fast. It runs in O(3n^log₂3) which is notably faster than the O(n²) method we learned in school. Karatsuba is a D&C algorithm. Binary search could be called a simple form of D&C.

Suitable practise problems: 10341 - Solve It

Greedy algorithms

A greedy algorithm makes local optimal choices at each stage during the algorithm in hope of producing a global optimum. Many problems will fail to do so with a greedy approach since the complete search space is not visited. But for certain problems greedy selection works well and can therefore create fast solutions. In other cases where the optimal solution is not possible to calculate in an efficient way, a greedy solution might come up with a decent approximation or bounded result.

Examples of greedy algorithms are the construction of a Huffman tree during Huffman coding or Kruskal's algorithm for finding a minimum spanning tree of a graph.

Practise problem: 11389 - The Bus Driver Problem

Backtracking

Backtracking is a technique that finds one or all solutions to a problem by trying different candidates and abandon those branches that breaks the constraints of the problem which means that the current path cannot lead to a valid solution of the problem.

An example where backtracking would be suitable is a solver for a Sudoku puzzle. By doing a depth first search and inserting new numbers in empty positions and for each insertion checking whether the new number breaks any constraints of the rules of Sudoku, the board can be iteratively filled with valid numbers. As soon as a number violates the rules and all numbers have been tried in the current position, the algorithm must backtrack to the previous position and try another number.

Examples where backtracking is suitable are puzzles and combinatorial optimization problems such as the knapsack problem.

A good practise problem is the classic Eight Queens puzzle where you should position eight queens on a chessboard without any queen threatening another. Here's a few modifications on that theme on the UVA site 11085 - Back to the 8-Queens, 167 - The Sultan's Successors.

UVA Online Judge

The online judge system at http://uva.onlinejudge.org/ accepts code in C/C++/C++11/Java. Some hints for beginners might be to start with an easy problem to get to know the judge. Good choices might be problem 10018 - Reverse and Add or 11172 - Relational Operator.

Important, read the input and output constraints carefully. For example, if the input consists of integers and are between 0 and 4,294,967,295 remember that a signed 32 bit integer will overflow so you should use a 64-bit integer or an unsigned int depending on the language you code in.

Output format matters, so be careful with trailing spaces and linefeeds. Sometimes you simply must try with or without an ending newline since the problem description won't tell which one is correct unfortunately.

Create a good template which you can copy when starting on a new problem. If you code in Java you will probably have everything you need in the standard library such as BigIntegers and much more. If you code in C/C++ it can be good to add some routines for handling arbitrary large numbers to your template.

If you get stuck on a problem try to create test cases for all the border cases allowed in the input. Try the worst possible input and also the smallest possible input to see that you get reasonable output.

Many basic problems can be solved with simple ad hoc brute force solutions but the more interesting problems with greater time complexities will perform very badly with naive solutions. The problems are often devised so that brute force solutions will not be able to perform within the time limits. However, once you've created a fancy solution based on DP or some other approach it is not always self evident that the algorithm is correct. In those cases it can still be valuable to run the outputs from your algorithm against a quickly coded brute force solution to verify that they produce the same output.

If you still is stuck and can't figure out why the online judge gives your solution "Wrong Answer" you can check the forums where many helpful programmers gives hints or test data to test your code against.

Good luck!

Edit 2014-05-01:
I got feedback on whether I could recommend any books for those interested to dig deeper. I got two books which I bought ten years ago which I still think should be relevant.

Programming Challenges: The Programming Contest Training Manual by Skiena and Revilla is exactly what it sounds like. A walktrough of different types of programming challenges and techniques for solving them.

Algorithms by Sedgewick and Wayne is more a general CS book and a good read.

Monday, 17 March 2014

New Eclipse project connected to Git repository

You have a new Eclipse project to work on and you want to version control it on a Git repository on another machine. Here's how you do it.

1. On the server you wish to have your Git repository remotely stored, setup an empty repository. In our examples here the repository and project will be called UVaOnline

johan@htpc ~/gitrepos $ pwd

/home/johan/gitrepos

johan@htpc ~/gitrepos $ mkdir UVaOnline.git

johan@htpc ~/gitrepos $ cd UVaOnline.git/

johan@htpc ~/gitrepos/UVaOnline.git $ git --bare init --shared=all

2. Add a file to this empty project if you wish to make sure there is something to check out from Eclipse.

johan@htpc ~/gitrepos $ cd /tmp

johan@htpc /tmp $ git clone /home/johan/gitrepos/UVaOnline.git/

johan@htpc /tmp $ cd UVaOnline/

johan@htpc /tmp $ touch FOO

johan@htpc /tmp $ git add FOO

johan@htpc /tmp $ git commit

johan@htpc /tmp $ git push origin master

3. In Eclipse, add the Git Repository Exploring perspective via the menu Window -> Open Perspective.

4. Click the "Clone a Git repository and add the clone to this view" icon in the perspective.

5. In the first step of this wizard, enter the connection details for the Git repository. In my case I use my SSH user to log on the server.

6. If your project was emtpy, you only have the master branch to work with. Click next.

7. Choose where you wish to have your local Git repository on the machine Eclipse is installed on. I selected the "Import all existing projects after clone finishes".

8. Now a new repository should be available in the Git repository Explorer.

9. Try to fetch the repository if you want to see the refs configurred.

10. You see the connection URL again, next.

11. Here you could add or delete refs pointers. For our purposes, this looks alright.

12. To import the project into the Java/Java EE perspectives, click the "Import Projects" option in the menu.

13. Since this project is a bare bones project with no .project file with Java facets, we must select to import the project as a "Import as general project".

14. Probably the default name is good enough.

15. Now you could start adding file to your project. I made my project a Maven project here and added some source code. Once you're ready to commit to the remote repository, right-click and choose Team -> Synchronise Workspace.

16. In the next view, select the files you wish to commit and add them to Git index first.

17. Then right click again and commit.

18. If you do not click "Commit and push" the files are only added to your local git repository on the same machine. That can be good if you want to pile up a bunch of local commits before pushing to the remote repository. When you want to push all local commits, in the Java perspective right click and go via Remote to Push.

19. If it is the first time, you might need to add a spec for this. If you work on master which you do if you have followed the tutorial, simple add "refs/heads/master" as source and destination ref and click "Add spec".

20. New spec added.

21. Once finished, the commits you had locally are pushed to the remote repo.

Thats it. I have some more stuff on working with Git in Eclipse in this blogpost about Github cloning.

For more advanced Git usage, you might need the console. That's no problem, simply make sure to refresh the Eclipse project after working in the console since Eclipse must refresh the .git folder etcetera. See this Git cheat sheat for my favourites.

Tuesday, 18 February 2014

Porthello Legends - Android Reversi Port

I wrote a clone of the game Reversi/Othello for Android three years ago and it actually made some progress on Google Play (Market as it was called then). It was downloaded nearly 1000 times and a lot of players seems to have enjoyed it because I could see in the highscore lists and the Analytics statistics that they spent quite some time on trying to beat the AI to gain access to even harder levels.

Unfortunately, the name Othello Legends was a bad choice since it was trademarked. That meant that after 6 months the game was suspended from Google Play. Since I put quite a lot of effort into the game it is a shame that it disappeared and I have now done a refactoring to the new name Porthello Legends :-).

My Android tablet died last summer and all I got to test with is an pretty old HTC Desire which the game seems to work fine on. It would be interesting to get feedback on how the game performs on newer phones/tablets as well.

Check out the game at Google Play, search for "Porthello Legends" or go to https://play.google.com/store/apps/details?id=se.noren.android.porthello&hl=en.

The old blogpost on the inital release can be found here http://macgyverdev.blogspot.se/2011/12/othello-legends-10-in-android-market.html.

Tuesday, 11 February 2014

Apache web server as reverse proxy and virtual host for Tomcat 7

I'm running a couple of sites on a Linux server on my local network. I only have one public ip address but I want to be able to host multiple sites on my servers. So this can be done by using a reverse proxy in front of the application servers which delegates traffic depending on which host name the end user was trying to request.

I've implemented this by using the reverse proxy feature with virtual hosts in the Apache HTTP web server.
The result this exercise is trying to achieve is that external traffic from internet requesting pages from the site www.all-about-units.com which is DNS mapped to my public ip address will be routed to the Apache web server on the local network via a port forwarding rule in the router on the local network. The Apache web server will look into its rules for virtual hosts and see that the user requested the domain www.all-about-units.com and therefore delegate the traffic to a separate Apache Tomcat 7 servlet container running on the local network to serve the request. The traffic is then forwarded back through the web server to the end user.

Setup Apache HTTP server as reverse proxy

First of all the Apache HTTP web server must be installed. I had mine installed since long ago but if I remember correctly there's not much than doing this is you have a Debian/Ubuntu/Mint like Linux server

sudo apt-get install apache2

Now, to enable the reverse proxy feature of the Apache server

/ $ sudo a2enmod proxy_http

Next, we must create a configuration for our new site as a virtual host. Go to the directory of available sites

/etc/apache2/conf.d $ cd /etc/apache2/sites-available/

Copy the default configuration to a setup for your site

/etc/apache2/sites-available $ sudo cp default all-about-units

Lets edit the configuration file. The first line tells the Apache server to listen on port 80, the standard HTTP port. Next the names of the virtual host are given. So traffic requesting another domain name or the ip address directly will not be handled by this virtual host configuration.

Next look att the ProxyPass and ProxyPassReverse, these tell Apache where to forward the request that has matched this virtual host rule. In this case to the Tomcat server on the same machine (127.0.0.1 is the loopback address of localhost) but on port 8080. The slash before the address means that all requests should be forwarded to the Tomcat server. If for example you had ProxyPass /foo/bar/ http://127.0.0.1:8080/ then the request www.all-about-units.com/foo/bar/page.html would be redirected to 127.0.0.1:8080/page.html

/etc/apache2/sites-available $ sudo vim all-about-units



<VirtualHost *:80>

  ServerName all-about-units.com

  ServerAlias www.all-about-units.com




  ProxyRequests Off

  ProxyPreserveHost On




  <Proxy *>

    Order deny,allow

    Allow from all

  </Proxy>




  ProxyPass / http://127.0.0.1:8080/

  ProxyPassReverse / http://127.0.0.1:8080/

</VirtualHost>

Enable and disable the site

To tell the Apache HTTP web server to enable this configuration, run the a2ensite command on the name of the new configuration

/etc/apache2/sites-available $ sudo a2ensite all-about-units


Enabling site all-about-units.

To activate the new configuration, you need to run:

  service apache2 reload

Reload Apache



/etc/apache2/sites-available $ sudo service apache2 reload

 * Reloading web server config

I had some issues so I restarted Apache the hard way also and then it works.

/etc/apache2/sites-available $  sudo /etc/init.d/apache2 restart



 * Restarting web server apache2                                                          apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1 for ServerName

 ... waiting apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1 for ServerName

That's it!

Verify in the logs of the application server that the request are directed correct and your home.
Next step might be to enable the mod cache feature of Apache to let the web server handle caching of all static material served by the application server like css, images and javascript. More on that in another post.

If you for some reason want to take the site of the web for some time you can instead of stopping the application server disable the site in the web server virtual host forwarding by using the command

sudo a2dissite all-about-units

Tuesday, 4 February 2014

How to make Jenkins install the packaged war in a Tomcat 7

This is a sequel to http://macgyverdev.blogspot.se/2014/01/setup-jenkins-project-from-git.html where I set up Jenkins to build my latest project. The next step is to install the war-file built by Jenkins on my Tomcat 7 server.

First of all we must make sure Tomcat accepts installations by scripting instead of the human html GUI version. In the tomcat-users.xml file, located in something similar to this:

/var/lib/tomcat7/conf/tomcat-users.xml

Make sure that the admin user, whatever his name is, has the role manager-script. Here's the important part of my file:

<tomcat-users>

</tomcat-users>

Now, in Jenkins you must install the Deploy Plugin. You find it in the plugins section.

Your Jenkins job is here assumed to be configured to package a war file. You can verify this by inspecting that a build generates a war file like this:

~/.jenkins/jobs/UnitConversion/workspace/target/Unitconversion.war

For this Jenkins job, go to the settings page and add a new post-build action. Choose the "Deploy war/ear to a container".

In this view make sure the path to the packaged war is relative to the workspace area of the job. Most probably it will be "target/yourapp.war".

Now, rebuild your job and when inspecting the console logs you should see that the war/ear is installed!

Deploying /home/johan/.jenkins/jobs/UnitConversion/workspace/target/Unitconversion.war to container Tomcat 7.x Remote
  Redeploying [/home/johan/.jenkins/jobs/UnitConversion/workspace/target/Unitconversion.war]
  Undeploying [/home/johan/.jenkins/jobs/UnitConversion/workspace/target/Unitconversion.war]
  Deploying [/home/johan/.jenkins/jobs/UnitConversion/workspace/target/Unitconversion.war]
Finished: SUCCESS

This way it takes two clicks from editing the code in Eclipse to update the site All About Units in production, first commit and push to Git, second start the Jenkins job.

One note, if you have deployed your application with context root "/" in Tomcat as I have done, see http://macgyverdev.blogspot.se/2014/02/how-to-change-context-root-to-in-tomcat.html, you can not have Context path = "/" in the configuration above. Instead, keep multiple context roots in Tomcat, and as in my example add the context root with a longer name, otherwise the deploy plugin cannot undeploy the ROOT web application. If you according to my blog post on it has manipulated the server.xml a reinstallation of "/webappname" will also reinstall the ROOT application since they point to the same directory on disk.

How to change context root to / in Tomcat 7

Here's my problem, I have a new project going which will be deployed as http://www.all-about-units.com/en. It's a site for general facts and info about units. It's a Java web application running on a Tomcat 7 servlet container. It is automatically deployed to the context root "Unitconversion" since that is the name of the WAR-file when it is uploaded via the admin ui.
That means I will access it as

http://myserver:port/Unitconversion

I want it to be deployed as context root "/" so that the URL becomes this in production

http://myserver:port/

My Tomcat 7 is running on a Linux Mint server but I guess these instructions are the same regardless of operating system. Here's how I managed to get it working.

First I uploaded my new web application in the Tomcat manager admin GUI. We got something like this now.

Now, if you inspect the file system you should have your app deployed at /var/lib/tomcat7/webapps.

johan@johanhtpc /var/lib/tomcat7/webapps $ ll
total 31040
drwxrwxr-x 5 tomcat7 tomcat7 4096 feb 4 20:22 .
drwxr-xr-x 6 root root 4096 dec 18 19:25 ..
drwxr-xr-x 3 root root 4096 dec 18 19:25 ROOT
drwxr-xr-x 10 tomcat7 tomcat7 4096 feb 4 20:22 Unitconversion
-rw-r--r-- 1 tomcat7 tomcat7 23617897 feb 4 20:22 Unitconversion.war
drwxr-xr-x 8 tomcat7 tomcat7 4096 dec 28 12:44 WeatherStationServer
-rw-r--r-- 1 tomcat7 tomcat7 8140647 dec 28 12:44 WeatherStationServer.war

The hacky way now would be to change the ROOT application with my new code, a simple overwrite with my web app. But that doesn't smell so good...

So what you do is that you edit

/var/lib/tomcat7/conf/server.xml

and add these lines inside the <Host> element. Change "Unitconversion" to your application name.

</Context>

</Context>

This will make the default root application appear under context root "/ROOT".

Restart Tomcat to make changes take place.

sudo /etc/init.d/tomcat7 restart

And now the app is accessible at "/" and at the old context root "Unitconversion" as well. This is also seen in the GUI if you refresh the manager.

Tuesday, 28 January 2014

Java 8 in Eclipse

Have you tried out Java 8 yet? The official release is not until March 2014 but there have been a set of release candidates to play with. It is actually very easy to get up and running.

Download a JDK 8 snapshot from https://jdk8.java.net/ and install it.
Install a version of Eclipse that can handle Java 8 syntax. I downloaded Eclipse Luna 4.4 from http://downloads.efxclipse.org/eclipse-java8/2013-05-19/.

To get started, create a new project in Eclipse and make sure you have chosen the Java 8 JRE.

Here's a class that uses a lambda expression to instantiate the functional interface PowerCalculator. A functional interface is an interface with only one abstract method.

public class TestClass {

    interface PowerCalculator {
        int pow(int n, int exponent);
    }

    public static void main(final String[] args) {
        final PowerCalculator calc = (int n, final int exponent) -> {
            return (int) Math.pow(n, exponent);
        };
        
        int product = calc.pow(3, 4);
        System.out.println(product);
    }
}

This outputs the expected "81".

That was maybe not very cool, but when looking into the more functional style of programming you can do with lambda expressions, this is the most interesting Java upgrade since Java 5 in my opinion.
I wrote an article in 2012 on FP in Java http://macgyverdev.blogspot.se/2012/10/functional-programming-in-java.html and took a quick look now again to see whether the original spec still is valid.

Here is a few examples I tried on the Collections framework. There has been changes since the 2012 draft. For example is the stream() interface added, previously you could do a filter() directly on the collection. The stream interface allows for parallel execution and

public class TestClass {

    public static void main(final String[] args) {
     List<Animal> animals = Arrays.asList(new Animal(10), new Animal(20), new Animal(30), new Animal(40));
     
     // Each animal eats another calorie
     animals.forEach(a -> a.eat(1));
     
     // Print animal
     animals.forEach(a -> a.print());
     
     // Print only the animals which have more than 30 calories
     animals.stream()
            .filter(a -> a.nutrition() > 30)
            .forEach(a -> a.print());
     
     // Calculate total sum of all animals calories
     int sum = animals.stream()
                      .mapToInt(a -> a.nutrition())
                      .sum();
    }
}

class Animal {
 private int calories = 0;
 
 public Animal(int calories) {
  this.calories = calories;
 }
 
 public void eat(int calories) {
  this.calories += calories;
 } 

 public void print() {
  System.out.println("Calories: " + calories);
 } 

 public int nutrition() {
  return calories;
 }
}

The Eclipse support in Luna 4.4 is pretty good. But when using the command completition it is not obvious what the different lamda expression consumers expect. I guess once you have played around with it some more you get accustomed to the Predicate and Consumer APIs.