# Tag Archives: comp sci

It’s been many months since I’ve posted here, due mostly to an extreme influx of work from classes and my jobs. But that doesn’t mean I’ve given up on this blog! Rather, I have a huge number of updates, which I will only describe in short detail for now. In no particular order:

After working at the Summer School in Theoretical Computer Science at Princeton, I went to one more conference the week after the school closed out. It was the annual RANDOM/APPROX dual conference, also held at Princeton. It was extremely interesting, with riveting talks by a variety of excellent speakers. I particularly enjoyed the keynote address by David Williamson, who spoke of a new generation of approximation algorithms called “lightweight algorithms” that he believes should and will define the future of the field. I saw a lot of really interesting talks and having taken a graduate course on this last semester, I was able to at least follow a good portion of most talks.

Next, I have begun my fall semester (actually by this point, it’s more than half done..)! I ended up taking two grad courses: Advanced Complexity Theory with Sanjeev Arora and an Advanced Topics course (aka, new courses which may or may not be kept in future semesters) on Information Theory with new professor Mark Braverman. Both are extremely interesting and have gotten very challenging. I think however that complexity theory started off much harder for me since I haven’t taken the undergraduate version of this course first, but as we moved on to new material, I feel like I’m keeping up fairly well with the class. As for information theory, it started off introducing really cool ideas that were pretty straightforward to follow, but our most recent topic of communication complexity is extremely difficult for someone with no prior knowledge or experience with this. The class has a couple of PhD candidates who are very familiar with this material, so I guess that somewhat helps a bit. But it goes very fast and I need to constantly review my material. Nonetheless, it’s a ton of fun! I’m also an undergrad grader for COS 340, which is called Reasoning about Computation. It’s sort of an introduction to theoretical CS through a variety of topics including probability, hashing, algorithmic analysis, approximation algorithms, graph theory, and more. Lots of super interesting material. It’s by no means a trivial course (either to take or to grade..) but it’s an excellent jumping point for anyone interested in theory. I actually haven’t taken this course in the past due to scheduling issues, so I’m enjoying following the material as the class does it too.

We also had a Quantum Computing Day mini-workshop at Princeton. I’m hugely interested in this subj. ct although I have little formal background in it. Nonetheless, I decided to attend anyway. The keynote talk was by Scott Aaronson of MIT, a well-noted theorist in this field. The talk was, as you might expect, brilliant. I’m pretty intrigued by how matrix determinants and permanents keep popping up in so many different places. Aaronson showed the application of those concepts in dealing with fermions and bosons, which given my limited quantum physics knowledge was new to me. There was also another great talk by an IBM researcher; this one a more applied talk about actually building larger multi-qubit gates while dealing with the immensely complicated issues of maintaining the superposition. The talk was very positive and the speaker showed that IBM has made a lot of progress in pinning down some of the problems. Although many believe QC is just a fad and will never actually make it to utility, I’m optimistic to see and follow real progress being made. If for nothing else, for science.

Finally, one more big news. I was offered an internship at Google in NYC! I had interviewed with Google late last year and made it to the host matching round. However, since it was already so late, there weren’t any available projects for me. Luckily, working through a recruiter over a summer, I was able to get a headstart on the process this year and was also waived through the technical interviews since I had done them just a few months prior. I was paired up with a prospective host from the NYC office and that interview went well. Before I could hear back, I was also paired up with a host from the Mountain View office and had that interview as well. In the end, it turned out that both hosts’ reactions were positive, but I chose NYC for a variety of reasons, mostly convenience. I’m extremely excited about this opportunity and I’m eagerly waiting for the summer! The team I will be working on is Site Reliability Engineering (SRE) and my project is something to do with security.

So those are the big news items from the past few months. Hopefully, I will have time to keep this updated more regularly hereon.

EDIT: One more thing! The research that I worked on with a few other undergraduates and my professor from the summer of freshman year has finally been published and presented at IMC 2011 in Germany early in November! The paper is also now online here (I think this might be behind a paywall).

A direct link is also available here.

Posted by on November 23, 2011 in Uncategorized

## Approximation Algorithms Workshop, Princeton 2011

These signs were all over the place!

This past week, I’ve been kept busy by attending a workshop on approximation algorithms, hosted by the Center for Computational Intractability at Princeton University. The particular workshop is this one, the purpose being to recap the last decade of advances in this field and to set the direction for the future. It was a five-day workshop with something like thirty speakers in total, with talks ranging from 35 minutes to an hour. It was my first major workshop that I have ever attended! It happened somewhat coincidentally, as I had emailed someone from the department for something totally irrelevant and got a reply regarding it from my professor for the approximation algorithms class I took last semester – he encouraged me to attend the workshop this week if I was attended and so I did. There were maybe three or four undergraduate students apart from me at the conference and it was pretty amusing to have every conversation start with “So which year of the PhD program are you at?”

There were some really quite amazing talks, such as one by Vijay Vazirani who did a overview of how much progress has been made on the twenty-one problems he predicted last decade would be the major ones in the field (at least 16 of them have either been conclusively solved or made substantial progress upon). It was a pretty exciting lecture and what was really cool was that the name “Vazirani” had come up again and again when we were studying approx. algorithms in class and it was pretty amazing to see him in person. He was also one of the three authors of the famous KVV Theorem (Karp-Vazirani-Vazirani) that gave an optimal result for any approximation algorithm for online bipartite matching. That theorem came up over and over again in other talks too.

Another cool talk was by Subhash Khot, inventor of the Unique Games Conjecture in his seminal paper, On the power of unique 2-prover 1-round Games. He reported on progress on the relations of UGC to other problems. Indeed this conjecture as well made its way into a good number of other talks. I had only very vaguely heard about UGC before this, but after hearing so much about how powerful it is, it’s pretty amazing. It ties in to a bunch of other theorems (If UGC is true, then …) and all of the results we’ve seen so far (that I’m aware of) derive reasonable conclusions with the assumption of UGC. It’s more evidence that it’s probably true and if it were ever proven to be so, it would represent a major step forward in this field (and as one speaker said, if it were false, we’d basically be back at square one).

My professor gave a talk on finding dense subgraphs inside graphs. He converted the problem into a decision problem of whether or not we can distinguish whether or not a graph has a dense $k$-subgraph (basically a set of $k$ vertices with a lot of edges between them). It was mostly understandable because I have a tiny bit of background in graph theory, but nonetheless a very difficult problem indeed. The talk was really interesting and it was a shame it was so short.

There were other talks by people from Google Research, Microsoft Research Labs, Bell Labs, and several universities around the world. I really wish I could explain every single talk that was given over the five days, but unfortunately my memory does not serve so well. The take-home message for me was that this is a truly exciting field and one that is filled with many, many brilliant people (duh) and that is being so very actively developed upon. It was an overall awesome experience to be in the company of such great researchers. I just wish I knew more than I did so I could gain more from attending. But that’s what the future is for!

Posted by on June 18, 2011 in Uncategorized

## The Art of Donald Knuth

The other day I had a few extra minutes before heading off for work so I decided to take a trip to my Engineering Library to check out a book I wanted to study over the summer. The book was called Pattern Recognition and Machine Learning, by Christopher Bishop. It’s preparation for a class I’m taking next semester called Advanced Probabilistic Modeling. Anyway, they didn’t have it. So instead I just figured I’d kill some time by checking out the usual sections of the library. I stumbled upon a book called the ACM Turing Award Lectures: The First Twenty Years: 1966-1985. Seemed interesting enough to pick up for a quick glance and I happened to check out Knuth’s Turing Award lecture. I really enjoyed reading the lecture and came across a lot of ideas that I’ve been trying to say before on this blog, although not nearly as eloquently as that of Knuth. To give a brief summary, the title of the lecture was Computer Programming as an Art, a theme that no doubt very closely resembles Knuth’s thoughts in writing his AoCP series. Anyway, I really should have read a lot of this before because I found it very inspiring. I’m going to just list a few nice quotations from the talk below:

Implicit in these remarks [that computer programming needs to made more into a science as opposed to an artistic, or imprecise, process] is the notion that there is something undesirable about an area of human activity that is classified as an “art”; it has to be a Science before it has any real stature. On the other hand, I have been working for more than 12 years on a series of books called “The Art of Computer Programming.” People frequently ask me why I picked such a title; and in fact some people apparently don’t believe that I really did so, since I’ve seen at least one bibliography reference to some books called “The Act of Computer Programming.

I thought this was quite interesting. It seems pretty clear to me that Knuth was a revolutionary in his thinking regarding programming, even when the notion of telling a computer a list of instructions to execute was something very new. I guess the process that the world took to master (if you can call it that now) the techniques of programming pretty nicely resembles, almost fractal-like, the process of learning of a single individual learning it for himself. That is, it starts by taking a bunch of seemingly random words, characters, numbers and putting them together in some haphazard fashion until the compiler stops yelling at you. Then the individual learns that in fact there is more rigor to the process than just throwing everything at the wall and seeing what sticks. There is the evolution of the learning process from chaos to some order; a science if you want to call it that. But that’s very much only the first stages. Because beyond the beginning stages, everyone pretty much gets how syntax works and where to put your semi-colons and whatnot. It’s the style which is where the fun really comes in. And you may be able to define how many spaces there are in a hard tab or where to put your braces in some document, but the approach by which one tackles a problem is much more innate. That’s how we have neat little algorithms to do our bidding instead of trying to crunch out everything the brute-force way each and every time. Herein lies the transformation from a rigorous science back into something of an art form. I would argue, therefore, that the art of computer programming is an evolution from the science of computer programming, not its predecessor. Expanding on this a bit further is another quote:

It seems to me that if the authors I studied were writing today, they would agree with the following characterization: Science is knowledge which we understand so well that we can teach it to a computer; and if we don’t fully understand something, it is an art to deal with it.

I kind of like this definition actually. It ties in with my previous post about potential functions and how there is an element of creativity necessary to actually do something of substantial value in this field (as with any!). But therein lies the fun of it. To put it shortly, there’s gotta be a reason we still have humans instead of computers writing computer programs. Related to this idea is the following quotation:

The field of “automatic programming” is one of the major areas of artificial intelligence today. Its proponents would love to be able to give a lecture entitled “Computer Programming as an Artifact” (meaning that programming has become merely a relic of bygone days), because their aim is to create machines that write programs better than we can, given only the problem specification. Personally I don’t think such a goal will ever be completely attained, but I do think that their research is extremely important, because everything we learn about programming helps us improve our own artistry.

I totally loved the above quote. I think it does a great job of highlighting exactly what this whole post, and probably this blog, is about. The point here is that reading this speech has provided me with quite a bit of more inspiration. There is something quite nice about reading these words.

Finally, in closing, here’s the quote that explains, in part, the inspiration for the title of this post:

One of the first times I was ever asked about the title of my books was in 1966, during the last previous ACM national meeting held in Southern California. This was before any of the books were published, and I recall having lunch with a friend at the convention hotel. He knew how conceited I was, already at that time, so he asked if I was going to call my books “An Introduction to Don Knuth.” I replied that, on the contrary, I was naming them after him. His name: Art Evans.

Posted by on May 25, 2011 in Uncategorized

Tags: , , , ,

## Potential in the Making

$\Phi = \sum_{e \notin C} n^{2w_e}+n\exp\left(\frac{1}{2a}\left(\sum_{s \notin C^\prime} c_s - \sum_S 3x_sc_s \log(n)\right)\right)$

This was the equation my professor put on the board. Magically. If you’ve seen algorithmic analysis before,  you may recognize the form as a potential function. Effectively, this is a very convenient and powerful way to performing an amortized analysis on an otherwise difficult-to-analyze algorithm. Said another way, it’s a way by which we can keep track of money coming in and going out of a metaphorical bank. Think of it this way: some weeks you need only $5 dollars, every once in a while you need$10 and one time you need $500 for a big expense. How much money do you usually need to go a week? A worst-case analysis will yield$500/wk, a gross over-estimate. A best-case analysis will yield $5, which doesn’t come close to covering the$500 you need that one time. Instead, we look at it through a type of averaging that allows us to retain the intricacies of the nature of the algorithm but more cleanly and tightly express bounds on its performance.

Here’s another example. Give a good estimate for a representative value of the “ruler function”. I put that in quotes because even I confused it with the more famous function which has differing continuities at rational and irrational numbers. The sequence I am here referring to is Sloane A001511. A graph of the function below reveals the reason behind its naming.

Now once more, a worst-case analysis or best-case analysis fail terribly (giving values of one and arbitrarily large as the sequence gets longer). But we can do better and that therefore is the purpose of amortized analysis.

There’s a great plenty that can be said about amortized analysis but I will focus on one particular aspect here: how the hell did we get that thing on the top of this page? I said it was magical but that was a lie. As it turns out, the rationale for this particular function derives from its formulation via a casting into a linear programming question. From this then were the characteristic properties drawn and put together. The problem for which this potential function applies is the Online Set Cover problem, which I will describe quickly below:

### Online Set Cover

This is a pretty natural extension to a very common and simple problem called the Set Cover problem. The basic idea is this: you’d like to cover a finite set of elements using a subset from a set of subsets of the original elements. Formally, given $E = \{e_1, \ldots, e_n\}$ and a family of subsets $S =\{S_i\}_{i=1}^m$ with $S_i \subseteq E$, with possible overlaps between different sets. Each subset is assigned a non-negative weight $c_i$ for each $i=1, 2, \ldots, m$ and we wish to find the subset of family $S$ that both covers every element in $E$ and has the minimum weight. In the online variation of this problem, we are provided each $S_i$ one at a time and we must immediately update our current best minimum weight set cover. The potential function above is used in the analysis of one algorithm to solve this problem.

The moral of the story, as I have learned through the course of this semester, is that potential functions are not as free-form as I had first thought. I was formally introduced to this type of analysis in both of my algorithms classes this semester and through seeing the utilization of this technique time and time again, I realize that reductions and re-castings of the problem in an entirely unrelated framework can eek out insights that otherwise fare well in avoiding the eye.

This is another part in my journey to learn about the structure of computation. As corny as it may sound, I like to think that numbers, in accordance to the laws of mathematics in this universe, hold deeper secrets than we like to usually give them credit for. My real aim here is to unravel the complexities of varied algorithms in different applications and find a common, basic, irreducible element of them all. It wonders me to see how much more we can see if we just look at a problem in the right way. While linear programming is something I’ve come to gain a great respect for this semester, it’s still only the tip of the iceberg. There is an even broader scope by which we may express a problem and perhaps other techniques will be the topic of future posts.

Posted by on April 21, 2011 in Uncategorized

## Spring Semester Update

Today marks the beginning of Spring Break for us and the midway point of the semester as well. This is most definitely the hardest semester I’ve ever had so far and although it’s very stressful at times, I’m really enjoying it a lot. I guess in some ways this semester was a way for me to determine how much I actually loved computer science at the its core – to see how much I enjoyed it if I fully immersed myself in it. The results so far are better than what I had expected: Even though the great majority of the my homework and assignments are about computer science and even though I am studying or attending class on some CS subject nearly every single day, my interest in it has only increased. I feel very at home when learning about the theoretical aspects of computation and complexity, or knee-deep in the analysis of some algorithm. Even though the workload is just so very barely manageable, I’m glad I am taking what I decided to take and I’d like, to the extent possible, to complete all six courses I am enrolled in this semester. Six courses is a bit on the high side for an engineer here, but I guess it’s more about how interesting you find them. As I’ve said before, it only feels like work if you can actually distinguish it from play. In any case, most of the work that I have fits comfortably in the category of play, so it’s basically an excuse to do something I’d probably be doing – or wishing I could be doing – during that time anyway.

I had a few goals for the year. One to take at least one graduate class before the end of the year; to study lesser known data structures and algorithms; to learn at least one more popular language and one more obscure language; and to get an introduction on quantum computation. Of those, I am in the process of accomplishing the first few of them. Thanks to my class on approximation algorithms, which is both tough and very enjoyable, I am satisfying the first one. In my Theory of Algorithms course, we are covering quite a few different interesting data structures as well as algorithms. Although these are not exactly “lesser known” in the greater sense of computer science, they’re also not the ones you usually learn in an introductory course. Specifically, I was very happy to see that we covered Fibonacci heaps, rank-pairing heaps, AVL trees, and RAVL trees. These, especially the first, are among the more advanced data structures used for very large data sets in industry. If I recall correctly, I believe Google Maps uses some implementation of F-heaps in storing data. In general, self-adjusting data structures are extremely interesting and it’s so neat to get a pretty bound on their performance regardless of how convoluted the implementation and procedures may be. As far as the other goals, I’m making (slow) progress on picking up Python and we’re using C++ for our computer graphics course. And as far as lesser known languages, our compilers course is taught strictly in ML – a functional language. I am yet to fully wrap my head around those exotic things. More thought is necessary on that front.

Although I’m satisfied with the coverage of algorithms for trees and graphs so far, I’m now also interested in learning about probabilistic algorithms. I was turned in this direction after seeing the power of probabilistic rounding techniques in solving LP-relaxation problems for approximation algorithms for very common problems such as vertex cover. Although they may not always provide an optimal bound, they do sometimes make the code and analysis much simpler and offer better worst-case performance. And they are everywhere too. From the example I just mentioned to an implementation of MST in linear time (!) in the number of edges, randomized and probabilistic algorithms play a huge role. And looking at it from a more pure sense, it’s also particularly interesting to see how the nature of random numbers can reveal insight on the structure of computation. I’m starting to develop some ideas of my own; perhaps the better word for them would be “questions” or maybe just situations and combinations of known elements in a way I haven’t seen anywhere. Hopefully I will soon be able to gain the mathematical and analytical machinery to actually see if these ideas make any sense at all.

In other news, my copy of Knuth’s classic Art of Computer Programming Vol. 1-4A finally arrived this week! As you may know, the fourth volume was just released after something like 38 years. I’m very excited to check it out. It’s currently sitting, still shrink-wrapped, on the table beside me. During the next week I’ll see if I can make sense of it. On another note, I am very pleasantly surprised by the quality of CLRS, another very excellent algorithms book. The analysis is surprisingly clean and the writing is precise, which makes it a pleasure to read. And they tackle complicated concepts with ease, making it look like it takes hardly any effort, which is impressive. Over winter break, I had started on Michael Sipser’s Introduction to the Theory of Computation, another classic text. It turns out this gave me a very nice edge when we were covering DFA/NFAs, regular expressions, and context-free grammars in compilers. Although I had had an introduction to these concepts in a previous course, I accredit Sipser’s book for teaching them to me in a rigorous sense. Once again, I see the same ease with with Sipser explains these concepts and proofs and it’s quite impressive. It makes it as painless as possible to cover these proofs with sufficient depth. So I will have to continue on with that text, seeing as it coincides very closely with what I am interested in. And as for the last goal, it is yet unfulfilled. I need a whole lot more knowledge before I go near that subject. Perhaps next semester or next spring. There’s just so much learn.

Posted by on March 12, 2011 in Uncategorized

## Spring Semester 2011

Today is Friday (well it was about an hour ago) and that means that this week is done. This week also happens to be my first week of my spring semester. And it was a hell of a ride. My current mood right now is a combination of extreme tiredness, sleepiness, and a great deal of excitement for weeks to come. [Pretty sure that last statement is grammatically incorrect but I can’t be bothered to fix it because it conveys my feelings well].  Going into the planning stages for this semester, I sought to take on a major challenge for myself: to out-do anything I’ve done before and push myself beyond any boundaries I have previously thought was the limitation of my abilities. Obviously, classes here are really hard, but I see them as springboards rather than roadblocks. And the springboard metaphor is quite fitting: you jump on-board and you sink and sink for a bit, very well aware of your weight and limitations and how long they’re dragging you down – but that’s only for the first half. Once you reach that critical state when things start clicking, it’s all uphill from there. Suddenly your own weight is actually helping you go higher. And guess what? You’ll end up higher than you could’ve ever jumped otherwise.

The moral of the story is obviously this: Avoid falling off on the way down and it’ll be worth it on your way up.

And that’s how it was for a few of my classes last semester too. I took an upper level math class on numerical analysis and found it pretty hard at first. It was intimidating to be in a room full of people a year or more older than me and a professor who walk into the room, give a half-hi gesture, and almost immediately begin lecturing without break for the entirety of the class. Although I do enjoy math a great deal, it does in no way imply that I’m particularly good at it. A math TA from last year put it aptly when he said mathematics was the process of banging your head on a table until 1) you passed out, in which case you called it a day, or 2) you eventually made a breakthrough. As convenient and elegant as it may to think that there’s some magical state beyond which the marginal difficulty of learning the next theorem/definition/proof/algorithm falls off to some small, constant value, I really am starting to doubt that’s the case. The more I make progress in my education on the fronts of both math and computer science, I’m thinking that what instead happens is that we, either by consciously looking for it or by our minds’ own doing, start seeing the very same patterns and paradigms again and again. Of course this isn’t a new idea, but it hits you hard when you make the realization for yourself. It’s interesting because it’s almost as if my brain is developing some form of auto-complete: give it some properties of some new entity and it “predicts” some of the others that will follow. There are obviously tons of exceptions to this and that’s where the fun comes in and keeps things interesting enough to continue pursuing (although the first time I heard matrix multiplication doesn’t commute was jarring to my pristine understanding of the world). And it’s this same notion of “auto-complete”, or intuition, that gives a better grip on the weird world out there and thus provides the illusion that the marginal difficulty is indeed decreasing.

Another metaphor which I particularly like derives from the use of the term “structure” when thinking about a problem: namely, in the context of a phrase like “now after X years of research, we have a better understanding of the inner structure of [complexity class/research problem/concept/etc]…”. In my mind, I see each of these concepts not quite as black boxes, but as dark rooms, the size of which is sometimes unknown from our current perspective. And so long as Erdös was just being metaphorical about his Book, there aren’t any lighting fixtures in the room. All we are given is a bunch of little candles. In fact we do have a few matches but it’s much harder to light a candle with a match than it is to use an already lit one. And so we go about setting up tiny little candles all about this room. They each illuminate brightly things in their immediate vicinity but the falloff of light is pretty drastic at times. And sometimes different candles illuminate the same portion of the room. Ideally, we’d like to use exactly one candle, so perfectly positioned so that it lights the entire room, but finding that position is almost definitely at least NP-hard or something… The idea is that there are rooms with candles from other people, in fact all of the other people in the world. And then there’s your own room, where you have to discover which candles are lit by yourself. You don’t have to necessarily light them yourself, but you have to discover that they do indeed exist. But of course, the room is too large to light up fully. So instead, we attack a certain direction. Perhaps we like what candles so far have illuminated or perhaps we think we’re on the edge of a major push forward. In either way, we are forced to narrow down our search. It’s pretty amazing how much brighter my room has gotten in just the past few months. (Baseless prediction: the room is a actually cylindrical. Interpret that as you understand it.).

And all of these thoughts are what are following me around these days. I love a good mystery and this is exactly that. I am consistently more amazed and in a state of awe than I ever expected. And I find the intricate complexity of the surface lit thus far extremely beautiful. Although theoretical computer science has a bit less of natural feel (that is to say, closeness to the ways of nature and the universe) than mathematics, it’s still astonishing to see how things fit together. Yes, computer science is a man-made field consisting of arguably arbitrary dichotomies depending on who you ask. And yes, this field is still so very much in its infancy. But nonetheless, they reveal something deeper than the definitions that we have given them. To put it shortly, there’s still some magic left which lays undiscovered waiting for us. As frustrating as it is that we do not understand some seemingly elementary relationships between things, it’s also exactly that which gives it its charm. I was sitting in class this week, with the professor writing theorem after theorem on the board, each of which had to do in some way with P vs. NP. And I thought how much more boring the class would have been if indeed we did know the answer. Or how even more boring it would be if $P \neq NP$. As much as I hope it’s resolved soon, it’s the idea of not knowing which is incredible in some strange way. It keeps the magic alive and I like it.

I considered what courses I wanted to take this semester. There are lots of things I want to learn about in computer science with only time being the limitation. I decided to go forward with a bold move by taking two very difficult theory classes together. They are both on algorithms: one on the theory of algorithms taught by the great R.E. Tarjan and the other a graduate course on advanced algorithm design – specifically approximation algorithms. They are fast-moving and the latter is extremely difficult (I don’t doubt the former will soon become so too!). But I’m not getting off the springboard, no matter how tempting it may be. I will continue to push forward, on until that pivotal moment hits where things start finally making sense. I’m learning an insane amount of things every single day and it’s amazing that a lot of things which I had read about casually in the past are all suddenly coming together with a much brighter luminance. It’s hard and I anticipate lots and lots of banging heads on tables ahead, but it’ll be worthwhile. This is one of those utterly invaluable experiences that I wouldn’t give up for anything.

I started the week inspired and now I am more inspired than I recall ever being. I live in an amazingly intricate and beautiful world and all I want to do is keep lighting candles.

Posted by on February 5, 2011 in Uncategorized

## Facebook Hacker Cup Qualification Round

So. I have a final later today and I’m too bored of studying so I decided to write up a post on the latest Facebook Hacker Cup competition problems. As you may know, this past weekend, Facebook had the Qualification Round of their Hacker Cup. Basically, it’s a programming contest very similar to that of Google Code Jam that I mentioned a while back and follows a pretty regular format.

In this round, we only had to answer one question out of three correctly to qualify. I did qualify so I know I got at least one right. Strangely, they only tell you the results after the competition is over, which is both weird and annoying. Oh and they messed up sending out the emails to the people who qualified. I made an incredibly dumb error in that I accidentally downloaded the input file before I was done writing my code and so the 6-minute time limit expired before I could submit my result. Fortunately, Facebook lifted this six-minute restriction due to bugs in their system so my (eventual correct) solution was actually accepted. The problem description and my solution is listed below:

### Double Squares

I can’t actually access the original question right now so I’m paraphrasing here. Effectively, we are given an input file with the first line being N, the number of lines that follow. Each of those lines contain an integer between 0 and 2147395600, inclusive. (Not 100% sure if that’s the right number exactly but its within that range). The solution should contain one integer per line of input containing the number of ways that the number can be written (order doesn’t matter) as a sum of two squares.

### Solution

There was a very nice little hint in light gray at the top of the problem page that effectively said:

too hard for brute force, switch to dp

Nice. Gladly, I saw that and immediately switched to a smarter algorithm (I wasn’t sure how hard these problems were and I wasn’t going to waste time writing a sophisticated algorithm if it was unnecessary – although my solution is pretty straightforward for this problem).

First, I used a calculator to compute the square root of that number. It was in the order of about 46,000, which is still small enough that pre-computing a data structure of that size is not out of the question. Further, since that number fits comfortably within the range of double-precision numbers, there was nothing special I needed to do. I started off by generating an ArrayList<Double> of each of the numbers from 0 to 46,342 squared. This actually executed faster than I thought it would and so I continued along this idea. (Note: For use in the DP algorithm I will describe, a HashSet would be a faster data structure although since I couldn’t perform a binary search on it, I decided to use an ArrayList. Since this is performed only once per input line, I didn’t stress too much about it.) Next, having performed the binary search of the input number on my giant list, I computed and stored the index of the greatest value less than or equal to the number itself.

With this index, I created a second array which contained numbers only less than or equal to it. The logic here is that since all our numbers are non-negative (indeed, the squares of all real numbers are non-negative), each of the addends must be less than or equal to the number (remember, 0 is on the list too). Now having generated this sub-array, I traverse it and subtract each element of it from the input number itself. This is effectively subtracting one of the squares from the number. What is the reason for this? To check if the remaining quantity is a perfect square and if so, to note that the resulting pair is one that we are seeking. I used some quick and dirty counting scheme of adding a half for each time I saw this (since we will be hitting upon commutative pairs) and adding a whole number for the number itself. Then we simply return the count that we have accumulated.

Very simple algorithm and runs pretty fast. We do a constant amount of pre-processing so total time for that $\mathcal{O}(1)$. Now if our number is K, then binary searching for it runs in time complexity $\mathcal{O}(1)$ time (remember, the size of the big array is fixed). Creating the sub-array and traversing it and subtracting it from the original number runs in $\mathcal{O}(\sqrt{K})$ time and traversing the list to search for numbers in the original list runs in $\mathcal{O}(\sqrt{K} \cdot 1) = \mathcal{O}(\sqrt{K})$ time (remember again, the big array has a fixed size!). Thus, our total running time complexity for each query (input number) is also $\mathcal{O}(\sqrt{K})$. One optimization would be to not search the big list for a match for every number in the sub-array but rather just until $\lfloor \frac{\sqrt{K}}{2} \rfloor$ indices and then multiplying the count by two. In addition, if I had used a HashSet for the big array instead, we would enjoy amortized constant time searching which would give us a time complexity of $\mathcal{O}(\sqrt{K})$, which is the same big-Oh time complexity but much faster if you write it in tilde notation.

import java.util.*;
import java.io.*;

public class Main
{
int SIZE = 46342;
ArrayList<Double> squares;

public Main()
{
squares = new ArrayList()<Double>;

for (int i = 0; i < SIZE; i++)

solve();
}

private void solve()
{
Scanner in = new Scanner(System.in);

int N = in.nextInt();

for (int i = 0; i < N; i++)
output(in.nextDouble());
}

private void output(double i)
{
if (i <= 1)
{
System.out.println("1");
return;
}

double num = 0.0;
int index = Collections.binarySearch(squares, i);
if (index > 0)
num++;

index = (index >= 0) ? index: Math.abs(index) - 1;

ArrayList subList = new ArrayList();

for (int j = 1; j < index; j++)

for (int j = 0; j < index - 1; j++)
if (Collections.binarySearch(squares, subList.get(j)) >= 0)
num += 0.5;

System.out.println((int)num);
}

public static void main(String args[])
{
Main app = new Main();
}
}


On my computer, running Ubuntu 32-bit and OpenJDK with Core 2 Duo processors at 2.53GHz, the total time taken to run this program on the input file given was 0.264s.