JAVA | The Tokenizer

Although it’s not directly connected to NLP, I decided to dedicate this post to a general algorithm called “Backtracking”. In simple words Backtracking is a general way to solve a problem by using a computer. The algorithm is considered “general” since it doesn’t describe how to solve a specific problem (like sorting numbers or searching for an element in a list), but it gives some high-level principles to solve many different problems.

The main idea of Backtracking is to solve a problem by starting from point A and going through all the possible options/paths until reaching point B. When we find a path from A to B it means we found a solution and solved the problem!

The general form of a Backtracking algorithm:

The easiest way to implement a backtracking algorithm is by using recursion.

First, We need to define three cases:

Reject: We are in the wrong way → this path can’t lead us to the solution.
Accept: We find a solution for the problem.
Step: We are at some point between A to B, continue to the next step.

Second, we just put our cases in this “template” (pseudo code):

function backtracking (input_list, output_list):

	# Reject – this path doesn't lead to any solution
	if reject case = true:
		return false

	# Accept – We find a solution!
	if accept case = true:
		print output_list # Our solution
		return true

	# Step – go over all the possible options in the input
	for each (n in input_list):
		test =  backtracking(input_list.remove(n), outputlist.push(n)) # Recursion
		if test = true:
			return true

	# We didn't find any solution
	return false

# First call
call  backtracking({some_input}, {})

Example:

Lets take this list of numbers: {3,4,1}, and try to sort them using Backtracking.

First, our cases:

Reject – if at the end of the output we have two unsorted numbers.

Accept – the input list is empty! (= we sorted all the numbers)

Continue – each time take a number from the input and try to put it at the end of the output.

Running the algorithm:

First call: original list {3,4,1}, output {}

pick 3 → original list {4,1}, output {3} – Continue
pick 4 → original list {1}, output {3,4} – Continue
pick 1 → original list {}, output {3,4,1} – Reject! no more options, go backward
pick 1 → original list {4}, output {3,1} – Reject! no more options, go backward
pick 4 → original list {3,1}, output {4} – Continue
pick 3 → original list {1}, output {4,3} – Reject! try the next option
pick 1 → original list {3}, output {4,1} – Reject! no more options, go backward
pick 1 → original list {3,4}, output {1} – Continue
pick 3 → original list {4}, output {1,3} – Continue
pick 4 → original list {}, output {1,3,4} – Accept! This is our solution!

Putting it all together – Java Implementation for sorting with backtracking:

import java.util.ArrayList;
public class BacktrackingSort {

	public static int counter = 0;

	public static boolean sortBacktracking
		(ArrayList input_list, ArrayList output_list,	int level) {

		// Trace
		counter++;
		System.out.println(counter + "# Level " + level
				+ ": " + input_list + " " + output_list);

		// Reject – this path doesn't lead to any solution
		if (output_list.size() > 1 &&
				(output_list.get(output_list.size() - 2)
						> output_list.get(output_list.size() - 1))){
			return false;
		}

		// Accept – We find a solution!
		if (input_list.size() == 0) {
			System.out.println("Solution: " + output_list);
			return true;
		}

		// Step – go over all the possible options in the input
		for (Integer n : input_list) {
			ArrayList clone_input_list
				= (ArrayList) input_list.clone();
			clone_input_list.remove(n);
			ArrayList clone_output_list
			= (ArrayList) output_list.clone();
			clone_output_list.add(n);
			boolean test = sortBacktracking(clone_input_list,
					clone_output_list, level + 1);
			if (test){
				return true;
			}
		}

		// We didn't find any solution
		return false;
	}

	public static void main(String[] args) {
		ArrayList input_list = new ArrayList();
		input_list.add(3);
		input_list.add(4);
		input_list.add(1);
		ArrayList output_list = new ArrayList();

		// First call
		sortBacktracking(input_list, output_list, 0);
	}
}

Different variations of Backtracking:

Some problems have a single solution, but some others have multiple solutions. Moreover, some problems have good and bad solutions – for example: solving a maze.

Find any solution:

This variation of Backtracking stops once it encountered any solution for the problem. In case of a maze, once we find a path from the starting point to the exit – we return it as the solution.

This variation of Backtracking is actually similar to the sorting problems.

I wrote a simple Backtracking code for solving a maze with JAVA. You can find the code here, and here is its solution:

# # # # # # # # # #
# X * * #   #     #
#     * #   #   # #
#     * * * * * * #
#               * #
# # #   * * * * * #
#       * # # # # #
# * * * *   #   # #
# S #             #
# # # # # # # # # #

Find the best solution:

Sometimes finding any solution is just not good enough and we need to find the best solution. Again, in case of maze, we’ll probably want to get the shortest path to the exit. In this case you need to change the Backtracking template a little bit. First, you should always check all the possible paths! Even if you find a solution, you should continue looking for a better one. The trick here is to have a dynamic Reject case. At the beginning of the algorithm you set some best_solution variable to a maximal value. During the runtime of the algorithm, every time you find a better solution you update this variable. This variable gives you the ability to drop possible solutions without completely checking them.

For example, in the runtime of a maze solving algorithm, your best_solution = 11, which means you already found a way of 11 steps to the exit. Then when you check some other solution, you can drop it immediately if it becomes longer than 11 steps!

Here you can find a simple Backtracking code for getting the best solution of maze, and here is its output:

# # # # # # # # # #
# X     #   #     #
# *     #   #   # #
# *               #
# * * *           #
# # # *           #
# * * *   # # # # #
# *         #   # #
# S #             #
# # # # # # # # # #

Find all solutions:

Of course there is a variation of Backtracking for finding all the possible solution for a problems. For example: You have this input of digits: {1,2,3,4,5} and you want to find all the permutations that their sum equals 12. This variation of the algorithms is between the two other variations (going over all the possible options with a regular Reject case):

import java.util.ArrayList;
public class BacktrackingDigitSum {

	// Find all permutations the equal MATCH
	public final static int MATCH = 12;

	public static int counter = 0;

	public static void digitSumBacktracking	(ArrayList input_list,
											ArrayList output_list,
											int sum, int level) {

		counter++;
		//Uncomment the line below to follow the algorithm runtime
		//System.out.println(counter + "# Level " + level
		//			+ ": " + input_list + " " + output_list);

		// Reject – this path doesn't lead to any solution
		if (sum > MATCH) {
			return;
		}

		// Accept – We find a solution!
		if (sum == MATCH) {
			System.out.println("Solution: " + output_list);
			return;
		}

		// Step – go over all the possible options in the input
		for (Integer n : input_list) {
			ArrayList clone_input_list
				= (ArrayList) input_list.clone();
			clone_input_list.remove(n);
			ArrayList clone_output_list
				= (ArrayList) output_list.clone();
			clone_output_list.add(n);
			digitSumBacktracking(clone_input_list ,
					clone_output_list, sum + n,level + 1);
		}
	}

	public static void main(String[] args) {
		ArrayList input_list = new ArrayList();
		input_list.add(1);
		input_list.add(2);
		input_list.add(3);
		input_list.add(4);
		input_list.add(5);
		ArrayList output_list = new ArrayList();

		// First call
		digitSumBacktracking(input_list, output_list, 0, 0);
	}
}

Do and don’t do:

As a developer, I can tell you that once you understand the concept of Backtracking, it can be a really useful tool. It can help you to quickly design and implement algorithms that can solve quite complex problems. However, the biggest problem with Backtracking algorithms is that usually their efficiency is quite poor (Brute-Force). Therefore you should be really careful when you choose to use it. I have my own set of rules for using Backtracking:

Don’t use Backtracking to solve a problem that already has a known solution – for example sorting. Sorting numbers with backtracking is a very bad decision, you definitely should use Quicksort/Mergesort! You should use Backtracking only when there is no other known/easy way to solve your particular problem.
Use Backtracking to solve problems with a small input or with a small set of possible solutions, otherwise your CPU will really hate you!
Trade-off – This is the most important one for me. Sometimes there is a known solution for a problem, but implementing & QA it may take a long time. In this case you can consider using Backtracking, as long as you still receive reasonable performances.

The bottom line: Backtracking is like a developers’ 5Kg hammer – don’t use it to hang a picture on the wall, use it if you need to break the wall!

My next post is going to be about Spellcheckers and it will include a nice example of how to build your own simple spellchecker by using Backtracking. Stay tuned and follow me on Twitter.

Shlomi Babluki

The Tokenizer

Here’s a few things you might need to know, or maybe you just forgot…

Tag Archives: JAVA

Maze Solving Algorithm – Practicing Backtracking

Share this: