Wednesday, April 2, 2014

Sunday Serious Series: Sorting and Efficiency

Many people probably will not think that tidying a room is intellectually challenging, but to me, it took a lot of thinking time when I was in high school. From folding and putting the pile of clothes from laundry  away to filing the notes of every courses, many tasks involve some kind of sorting and I always want to know the best way to do it. It turns out that the same problem is encountered in computer science, and computer scientists have invented and characterized many different algorithms - so many that the problem for me now is to determine what criteria do I have for my room tidying problem so I can pick the most effici algorithm.

To start with, we will need some way to describe the algorithm, and in computer science it is done with a description of the complexity of the algorithm. The analysis can be done on resources (such as memory space and time), but it is not limited to those - instead we can work on, say, the number of arithmatic operation. After picking a certain criterion, we can study the behavior of the algorithm in different scenarios, and in a lot of times we wil study the asymptotic worse case scenario complexity. This is the case where we concern about what is the worst that can happen given a large enough input. This is useful since the situation resources usage is actually of high importance is in large-scale application  and it is best to optimize in this setting.

And now we can talk about the basic algorithms. As discussed in the post on recursion, it is easier to talk about them recursively even though they can be done iteratively, and that is what we are going to go through. The algorithms are then divided into two groups - the linear recursion ones and the divide-and-conquer ones. All of them are really common and I will not cover the details of the algorithm.

Bubble sort, selection sort and insertion sort all belongs to the first group. Essentially they work by reducing the problem size by one in each recursion, and solving the linear recurrence relation will give that the complexity will scale like the square of the size of input in the worse case.

Merge sort and quick sort are the divide-and-conquer ones, which break down the original problem into two (can be more) sub-problems and combine the results. This kind of recursion is solved by the Master Theorem (or more generally the Akra-Bazzi method), and it turns out that, given the pivot in quick sort is chosen well (go search online for details), both merge sort and quick sort scales like product of the size of the input and its logarithm. And it has been proved that this is the best you can get from algorithms that is based on comparison.

Of course, nothing is perfect and the way asymptotic complexity characterizes algorithms has it own deficit. First it just says it scales like certain function, but the constant of proportionality is hidden. In fact the constant might be so large that it is better to use an algorithm of higher complexity if the input is not 'large' large. Also, implementing the algorithm requires extra care if some other non-performance related characteristics of the input has to be considered (such as stability). It should also be mentioned that a lot of inputs, like the untidiness of my room, is not usually the worse case and the actual average behavior can vary. (Unfortunately average case scenario is much harder to analyze.)

The morale of the story, I suppose, is that the problem in my room does not worth the time used on analyzing algorithms.

Monday, March 10, 2014

Sunday Serious Series: Argh Recursion!

So, I just finished an assignment for CSC240, and there is one question that took me more than 4 hours to completely figure out. It is kind of silly, but the source of difficulty is simply recursion NOT being used, and hence I find my motivation to write about recursion and finish this long overdue post.

So what is recursion, specifically in computer science? Recursion is a way to view a problem or object in terms of components that are similar to the original subject of interest, and then the whole thing is reduced to recursive steps and/or base cases. For example, it can be finding the greatest common divider of two numbers using Euclid's algorithm, or a particular implementation of tree data type - linked tree. Frequently this is compared to the other way to do things, the iterative way. These two ways are equally powerful in their ability to handle things (you can loop over the numbers to find the greatest common divider, and tree can be done as nested list), but, as usual, one way is better than other in different scenarios.

The first thing to look out for when using recursion is correctness. Usually, when the problem and object is handled recursively, the correctness is simply tied to induction proofs. If a proof of correctness is required, to me it seems very natural to find a recursive algorithm or definition, since proving correctness for iteration can be a huge pain - you need to come up with invariant for the loop to show it is correct and that is what makes me working so long for the CSC240 assignment. Although proof of termination is more difficult in recursion when there is no base case to reduce to (Euclid's algorithm for GCD is a good example), the work saved here by adapting iteration is, in my opinion, slightly less than the extra work in proving correctness.

The second thing to be careful about in recursion is the amount of information to characterize the question or object. A lot of times what is required is not sufficient for a recursion and more information is needed to actually consider the subject recursively. I myself find this come up especially often when solving problem of mutable objects in OOP, since not only the basic induction has to be handled, dividing objects into smaller component for recursion also requires some more information to specific the operation. A wrapper-helper approach is often accompanying to separate the concerns, and sometimes help reducing computational resources taken up as in memoization.

In general, well-written recursions are easy to read and understand, and it is quite plausible that using it more often will make it even better - something that I don't think iteration has going for it.






Tuesday, February 25, 2014

Sunday Serious Series: A1 Report Part 1

There will be two posts (hopefully) this week, since I finally gathered enough inertia to write about my first assignment (A1) in CSC148!!! This part will focus on how I feel about the assignment retrospectively, and in the second part, which will come when result is actually released, I will talk about the the feedback I get, etc.
The main theme of A1 is the 4-peg variant of Tower of Hanoi. It takes me 2 all-nighters to finish (no no I didn't do it right before the deadline. I just couldn't resist to finish it in one go.) The work that was to be done can be loosely divided into three main parts.

Part 1

The part includes Step 1 to Step 4, and is the implementation of a class to represent the game itself. In this part I think the most important and also the hardest task is to understand the underlying design principle - which I would say is Separation of Concern. As in this page on separation of concern on Wikipedia, under this principle a computer program is seen to be composed of different sections, each section dealing with a separated concern, which is information that interacts with the codes. The ConsoleController class will deal with user input, while the TOAHModel and Cheese classes is responsible of modelling of the game. I myself think that OOP is highly related to this design philosophy - the implementation the Cheese class not only make it object-oriented, but also separates the rules of moving cheese (in TOAHModel) from Cheeses itself. Doing this part can be a nightmare if the principle is not fully appreciated - it will basically be a blind guess on how should codes be written to work with the give starter code.

Part 2

This part is just Step 5 and Step 6, but it is the essence of this assignment - solving the  4-peg variant of Tower of Hanoi. The solution is one of the most complicated recursion I have ever written, since it acts on mutable objects and extreme care is needed to keep everything in track. My implementation is slightly clumsy, not that it didn't give back the expected result, but a lot of computation is redundant. This and the method I learnt to resolve the redundancy will be discussed in greater detail in the next post, which will focus on recursion.

Part 3 (Bonus)

This is probably the easiest part, the only difficulty is picking a standardized key for sorting lists containing cheese objects. Not really requiring specific knowledge.

Overall I don't find the assignment particularly challenging in terms of new concept, but still I think a lot of confusion can be avoided if the instructions given are more clear - after all this assignment does not involves much freedom in design and the corresponding constraints should be easily understood. I have also looked into the second assignment, which involves many more design and lots of freedom is given. Perhaps I will give a intermittent report on my progress - it seems hard.



Sunday, February 9, 2014

Sunday Serious Series: Object-oriented Programming Rebooted

Two weeks after the first post.
A lot have changed, and one of them is my understanding on OOP after finishing the first assignment (yay!), and I find my first post is basically explaining something abstract with some other abstract concepts. Which is bad, real bad. So I guess I am rewriting that part, but I will leave the first post there as a reference (to how little I knew). See my good friend http://thefabricofourpeople.blogspot.ca on why I think so XD

What is Object-oriented Programming (OOP)?
OOP is a programming paradigm (style), where the idea of treating everything as objects - from integers and Boolean (which is interestingly a sub-class of integers, will explain what that means below) to functions and exceptions, they are all objects. User-defined classes  can be created if what you exactly need is not available. As mentioned in the first post, being object-oriented can provide insights into the actual relations in between different objects, and it is very intuitive (to think about, at least) since this is often how we see things in reality. Of course, this is a kind of formality, and its implementation and support is different for different languages, so the exact details can vary.

Formality? Does that mean inefficiency?
Yes and no. When I started using OOP in the first assignment, the first step in the instruction is to 'Read and understand the Cheese class.' and all the class specifies is attribute that is an integer, and a method to compare cheese base on that integer. All the work done by this Cheese class can be done without creating a new class. The value of having an object representing cheese is not there. Instead, it allows us to think about the cheeses and not the integer. This really outweighs the cost of the additional code needed , since when I am trying to work on later parts of the assignment (solving a 4-stool variant of Tower of Hanoi recursively), I can think in terms of the pieces that I need to move, and then code with the same thing. I don't need to mentally convert it back to some specific integers when I write the code, and in larger program there may just be too much such mental work to be done and errors are creeping in. So similar to many other kind of formality, it allows us to pay a small price and avoid a lot of trouble. It should be noted that it does not magically do the work for us, nor it is capable of doing something that the language itself could not accomplish.

Okay, does OOP has any other drawbacks then?
In my opinion, yes - some small ones do exist. Creating an extra layer of formality can lead to confusion when different people do it differently. To avoid this extra documentation and communication are often necessary, which is again work to be done - this also happens in the programming language level. Another possible problem is that while the formality makes it easier to think within its framework, it sometimes affect the ability to think outside of it. I don't know how big this problem is - this is purely theoretical, but I guess awareness is needed to avoid being constrained by OOP.

I guess this should showcase OOP a bit better than the previous post XD I will be posting shortly to talk about the assignment more specifically.

Sunday, January 26, 2014

Sunday Serious Series: Object-oriented Programming Part 1

Hi everyone, this is ACC and this is my courSe bLOG (SLOG) for CSC148 (no idea why I wrote 240 at first, and I am not sure if I should change the blog address now). The blog will be revolving around my adventure in CSC148, and this series will be the main focus where I report my learning progress each week. For this week, I will be writing about Object-oriented programming, on the concept itself.

Edit: See my second post, together with this they are a better round-up of OOP.
Object-oriented programming (OOP) is one of the so called 'programming paradigm', that is, programming style. In this programming style the main focus is all kinds of different abstract objects and their interactions, to make the purpose of the program more clearly presented. These objects can be inspired from real life objects intuitively, which is a main selling point for OOP. Since OOP itself is quite abstract too I shall illustrate this by two examples:

  • Real objects have all sort of properties, which are called attributes and methods in OOP. Attributes are like information such as height of a person, while methods are functions are what an object can carry out, for example the toast function of a toaster. You can then ask the toaster to toast the bagel, which is another object. Pretty straight forward.
  • Now of course objects in real life are not all distinct but can be categorized by their properties (example being laptop and desktop are both computer, and spork having the properties of spoon and fork). Now notice that this two way of categorization is different: a laptop inherits the properties of a computer and it could have other properties (a touch pad), but the property of spork is a composition of the properties of a spoon and a fork. This kind of difference is carried to OOP as inheritance and composition of objects (see http://learnpythonthehardway.org/book/ex44.html).

As you can see, OOP is pretty powerful in representing and illuminating intricate relations, and I would imagine there is much more that can be drawn from how OOP works. Next week I will be covering my actual experience of OOP in Python (by the time I finished my second exercise and started working on my first assignment).

P.S. Please tell me if the font size is too small, or any other suggestions you have!