RSS

Tag Archives: code

A Week Off

So I’m going to be taking next week off just so I have some more time to relax and refocus. I’ve felt that I never really got a chance to properly recharge between the end of finals and the start of my summer work. As I’ve said before, I just want some time free to think. More than anything, I enjoy having the time to do nothing in particular and not feel guilty about it either. My objectives for the week? I’d rather not specify exactly, but mostly I’m going to do lots of reading, some coding, some testing of Minefield, and clearing through all those links I’ve saved for reading at a later time.

I think now is also a good time to give an update on my research as I’ve completed about half of it or so I guess. So basically, the goal is to discuss the appliability of and to design a better resource allocation algorithm in a distributed computing system. Clearly the main problem is to figure out not only how to evenly distribute available resources in order to fully utilize the capabilities of the system, but also, on a local level, run each program as fast as possible. To accomplish this, it is not good enough to just give everyone an equal share – rather, you have to give each program the combination of resources that best fit its needs. Along a related thread of thought, we see as a natural consequence of this non-trivial allocation scheme that there exists some parts of the system – some nodes – are more heavily utilized, or more “popular.” But what is it that makes it so? These are the questions we are seeking to answer. Our approach is to look at the past for clues. So, we’re digging through many years’ worth of log files of usage in order to see how the popularity of nodes has evolved in the past. We are testing out several complex models of popularity metrics and seeing how well they match the data.

There are three of us working on this project and each of us are focusing on different aspects. For my part, I’ve been focusing specifically on how to extract pertinent data from the terabytes of textual log files in acceptable speed and how to best automate the task of actually finding relationships. I’ve decided to break up my work into many small programs that do different things and are tied together by scripts that automate as much as possible. Apart from modularity being good style, it also preserves sanity since this stuff can get pretty damn hairy.

There was one specific aspect that caught my interest in particular. The problem was tactfully posed by my professor as follows: let’s say we want to see how the temperature of water in a swimming pool and the height of the dive is related to how much pain the diver feels. Holding the temperature constant, we expect higher dives to hurt more but if the height isn’t too much, the relation may be weak and depend more on how you dive, etc. But conversely, how does temperature make a difference? At first glance, there’s no appreciable trend for a small range of temperature. But cross 32 degrees F and it’s a different ballgame. We strongly expect the height of the dive to affect pain felt and for it to hurt a lot more!

In this same, consider values of two seemingly independent data fields, one being the synthetic field of popularity and the other being a more tangible entity that deals with actual hardware. We seek to automatically find the ranges of the data that yield strong correlations.

In mathematical language, we seek to automatically perform a segmented regression on the data. Our final regression may be piecewise-linear, some crazy high order interpolated polynomial, or even non-parametric; but for our needs, this is at present not our concern. Thus, I chose to only look at the correlation coefficients. But there’s more than just one type: we are all taught Pearson’s coefficient, but other popular ones include Kendall’s Tau and Spearman’s Rho. Currently I’ve just implemented Pearson’s but this will almost certainly be replaced by Kendall’s. Why? Because linearity is not as importance as dependence in general. Having decided upon this, it was time to next decide how I was going to implement this in code. The main challenge was to automatically determine the relevant breakpoints that would yield the maximum correlations. There is a good deal of literature on this but in my preliminary search I only found papers that were either very marginally related to what I was doing or just too complicated, so I decided to implement a simple algorithm on my own.

I was writing this in C and so I grumbled about how I didn’t have my usual built-in data structures as in Java (grumble grumble) but I did have a decent ArrayList and HashMap implementation I could use (I miss my TreeMaps everyday…). Basically there are two parameters we wish to maximize: the correlation and the length of the segment. This need arises since a segment of one or two points will de facto have a correlation of unity and our data is certainly non-uniform. So, I thought of some kind of learning algorithm to accomplish the task and finally ended up with a simple algorithm that runs in ~2*N time (where ~ denotes tilde notation). Basically I start with some initial set of breakpoints evenly distributed along one axis. Then, I run a loop wherein I compare correlations of adjacent blocks. If their weighted sum of correlation and number of points in the block exceeds either block individually, I remove the middle breakpoint and coalesce the blocks. I recursively do this through all the blocks and then do the exact same thing in the reverse direction.

Now I have a set of unequal length blocks. I then copy all the blocks’ information into my ArrayList and sort by a custom comparator that compares the weight of the block, defined by: X*abs(correlation)+Y*(nelem_in_block), where X and Y are some constants.

In practice, this approach works reasonably well when the number of initial breakpoints is large enough. Besides, since we are only looking for heuristics, this does the job.

Now that that’s all done, I ran some tests on the data and the results are looking close to what we expected. When I get back, I’m going to tackle the task of tweaking and reworking the popularity metric to get even finer results.

But for right now, it is time to kick back and relax. Awesome.

Advertisements
 
2 Comments

Posted by on July 9, 2010 in Uncategorized

 

Tags: , , , , ,

Google Code Jam Global Round 1A

Okay so I didn’t make this time. I had enough solutions right to qualify but didn’t do it fast enough. And considering that the competition was on the same weekend that I had to move out of Princeton, I didn’t have the time or energy to do compete in Rounds 1B or 1C. But it’s all okay because I had a fun time and now summer is here! At least for the next two weeks before I have to get back and start summer research.

But regardless, below I have posted my solution to the first problem of Round 1A. I had a somewhat fast algorithm but nothing too special. Regardless, here is the source:

import java.util.*;
import java.io.*;

public class Join
{
   private String filename;
   private char[][] board;

   public Join(String f)
   {
      filename = f;
      openFiles();
   }

   private void openFiles()
   {
      Scanner s;

      try
      {
         s = new Scanner(new File(filename));
         int ct;

         ct = s.nextInt();

         for (int i = 0; i < ct; i++)
         {
            int N = s.nextInt();
            int K = s.nextInt();

            board = new char[N][N];

            for (int j = 0; j < N; j++)
            {
               String line = s.next();

               for (int k = 0; k < N; k++)
                  board[j][k] = line.charAt(k);
            }
            output(i + 1, process(K, N));
         }

         s.close();
      }
      catch (Exception e)
      {
         System.err.println("I/O error: " + e);
         System.exit(0);
      }
   }

   private String matchesRed(String tmp, String retval, int K)
   {
      String regexRed = ".*R{" + K + "}.*";

      if (tmp.matches(regexRed))
      {
         if (retval.equals("Red"))
            return retval;
         if (retval.equals("Neither"))
            retval = "Red";
         else
            retval = "Both";
      }

      return retval;
   }

   private String matchesBlue(String tmp, String retval, int K)
   {
      String regexBlue = ".*B{" + K + "}.*";

      if (tmp.matches(regexBlue))
      {
         if (retval.equals("Blue"))
            return retval;
         if (retval.equals("Neither"))
            retval = "Blue";
         else
            retval = "Both";
      }

      return retval;
   }

   private String process(int K, int N)
   {
      String retval = "Neither";
      gravity(N);

      // Check vertical
      for (int i = 0; i < N; i++)
      {
         String tmp = new String(board[i]);

         retval = matchesRed(tmp, retval, K);

         if (retval.equals("Both"))
            return retval;

         retval = matchesBlue(tmp, retval, K);

         if (retval.equals("Both"))
            return retval;
      }

      // Check horizontal
      for (int j = 0; j < N; j++)
      {
         String tmp = "";

         for (int i = 0; i < N; i++)
            tmp += board[i][j];

         retval = matchesRed(tmp, retval, K);

         if (retval.equals("Both"))
            return retval;

         retval = matchesBlue(tmp, retval, K);

         if (retval.equals("Both"))
            return retval;
      }

      // Check diagonal 1
      for (int i = 0; i < N; i++)
      {
         int orig = i;
         String tmp = "";

         for (int j = 0; i >= 0;)
            tmp += board[i--][j++];

         retval = matchesRed(tmp, retval, K);

         if (retval.equals("Both"))
            return retval;

         retval = matchesBlue(tmp, retval, K);

         if (retval.equals("Both"))
            return retval;

         i = orig;
      }

      int i = N - 1;

      for (int j = 0; j < N; j++)
      {
         int orig = j;
         String tmp = "";

         for (; j < N;)
            tmp += board[i--][j++];

         retval = matchesRed(tmp, retval, K);

         if (retval.equals("Both"))
            return retval;

         retval = matchesBlue(tmp, retval, K);

         if (retval.equals("Both"))
            return retval;

         i = N - 1;
         j = orig;
      }

      // Check diagonal 2
      for (int j = N-1; j >= 0; j--)
      {
         int orig = j;
         String tmp = "";

         for (i = 0; j < N;)
            tmp += board[i++][j++];

         retval = matchesRed(tmp, retval, K);

         if (retval.equals("Both"))
            return retval;

         retval = matchesBlue(tmp, retval, K);

         if (retval.equals("Both"))
            return retval;

         j = orig;
      }

      int j = 0;
      for (i = 0; i < N - K; i++)
      {
         int orig = i;
         String tmp = "";

         for (; i < N;)
            tmp += board[i++][j++];

         retval = matchesRed(tmp, retval, K);

         if (retval.equals("Both"))
            return retval;

         retval = matchesBlue(tmp, retval, K);

         if (retval.equals("Both"))
            return retval;

         j = 0;
         i = orig;
      }

      return retval;
   }

   private void gravity(int N)
   {
      int j;

      for (int i = 0; i < N; i++)
      {
         String tmp = "";

         for (j = 0; j < N; j++)
            if (board[i][j] != '.')
               tmp += board[i][j];

         int diff = N - tmp.length();

         for (j = 0; j < diff; j++)
            board[i][j] = '.';

         for (; j < N; j++)
            board[i][j] =  tmp.charAt(j - diff);

      }
   }

   private void output(int caseNum, String result)
   {
      System.out.println("Case #" + caseNum + ": " + result);
   }

   public static void main(String args[])
   {
      Join app = new Join(args[0]);
   }
}

So basically, I first took the input and put it inside a two-dimensional character array. Then, I implemented gravity (if only it were so easy always!). I made the observation that all gravity does is move the non-period characters to the right side and pushes all intermediate period characters to the left. Originally, I had a recursive algorithm to do this, but this was way simpler. So, I parsed each line and collected the non-period characters in a temporary string. Then, I added sufficient periods as padding and then just threw my collected string at the end of each line. Do this N times, and you’re done. Of course, I could’ve used some regex fu to make this even faster, but it was straightforward enough and did the job in \mathcal{O}(N^2) which is the best asymptotic performance you can get (gotta read every character..).

Then, I did a series of checks to see if any K consecutive letters had been formed as a result. To do so, I decided that the simplest way to do so would be to just read in my array vertically, horizontally, and by diagonals and check to see a match. Rather than dealing with messy DFAs, a one-line regex check did the trick. Here, the regex I used was constructed as the following: ".*R{" + K + "}.*" for the red pieces. It’s a really simple one and basically it just looks at an arbitrary number of lead characters (“.*”) and then checks for exactly K characters of value “R” (R{K}) and then ignores an arbitrary number of characters that trail (“.*”). Of course, the regex string has to actually have a number in the place of K, which is why I constructed in this way. Similarly, to check for blue pieces, I just used ".*B{" + K + "}.*";.

You can check out the code to see how verbose, ugly, and unoptimized it is. But given that the largest value for N was 50, doing about 5 traversals of a 50×50 grid was acceptable. And in fact I was correct on this point. For the small input, the code run in 0.298s and for the large input, it runs in 0.980s. Good enough.

 
Leave a comment

Posted by on May 24, 2010 in Uncategorized

 

Tags: , , ,

Google Code Jam 2010 Qualifying Round Solutions (Part 3)

Finally, my solution for the Theme Park problem. For this one as well, I had a somewhat optimized solution that brought down 10^8 iterations to at most 2000, with an average case of around 500-750 iterations. The solution, again in Java, is below:

import java.util.*;
import java.io.*;

public class Theme_large
{
   String filename;
   ArrayList<Group> groups;

   public Theme_large(String f)
   {
      filename = f;
      openFiles();
   }

   private void openFiles()
   {
      Scanner s;

      try
      {
         s = new Scanner(new File(filename));
         int ct;
         long R, k;
         int N;

         ct = s.nextInt();

         for (int i = 0; i < ct; i++)
         {
            R = s.nextLong();
            k = s.nextLong();
            N = s.nextInt();
            groups = new ArrayList<Group>();

            for (int j = 0; j < N; j++)
               groups.add(new Group(s.nextLong()));

            //System.err.print((i+1) + ": ");
            output(i + 1, process(R, k, N));
            //System.err.println();
         }

         s.close();
      }
      catch (Exception e)
      {
         System.err.println("I/O error: " + e);
         System.exit(0);
      }
   }

   private long process(long R, long k, int N)
   {
      long r = 0;
      int n = 0;
      long totEarn = 0;
      long k_i;
      int n_start = 0;
      long first;

      if (N == 1)
         return groups.get(0).getSize() * R;
      // loop as many times as necessary to start at same spot
      // again.
      while (r < R)
      {
         k_i = 0L;
         n_start = n;

         // check if you've already started here - a cycle
         if (groups.get(n_start).isHit())
            break;

         // set earnings at start
         groups.get(n_start).setEarn(totEarn);

         while (true)
         {
            // if adding the next group will exceed k, break
            if (k_i + groups.get(n).getSize() > k)
            {
               totEarn += k_i;
               // hit the one you started on.
               groups.get(n_start).hit();
               groups.get(n_start).setR(r);
               break;
            }
            else
            {
               k_i += groups.get(n).getSize();
               n = (n + 1) % N;
            }

            // one group can't ride more than once at the same
            // time!
            if (n == n_start)
            {
               totEarn += k_i;
               // hit the one you started on.
               groups.get(n_start).hit();
               groups.get(n_start).setR(r);
               break;
            }
         }

         r++;
      }

      if (r == R)
         return totEarn;

      long loopCt = groups.get(n_start).getR();
      first = groups.get(n_start).getEarn();

      long quo = (R-loopCt) / (r-loopCt);
      // how many full cycles will exist; round down

      totEarn = first + ((totEarn - first) * quo);
      // find that totEarn
      r = loopCt + ((r-loopCt) * quo);
      // effectively performed that many cycles

      // do loop again to take care of remainder, ignore hits

      while (r < R)
      {
         k_i = 0L;
         n_start = n;

         while (true)
         {
            // if adding the next group will exceed k, break
            if (k_i + groups.get(n).getSize() > k)
            {
               totEarn += k_i;
               break;
            }
            else
            {
               k_i += groups.get(n).getSize();
               n = (n + 1) % N;
            }

            // one group can't ride more than once at the same
            // time!
            if (n == n_start)
            {
               totEarn += k_i;
               break;
            }
         }

         r++;
      }

      return totEarn;
   }

   private void output(int caseNum, long result)
   {
      System.out.println("Case #" + caseNum + ": " + result);
   }

   public static void main(String args[])
   {
      Theme_large app = new Theme_large(args[0]);
   }
}

class Group
{
   private long size;
   private boolean hit;
   private long totEarn;
   private long r;

   public Group(long i)
   {
      size = i;
      hit = false;
      totEarn = 0;
      r = 0;
   }

   public void hit() { hit = true; }

   public boolean isHit() { return hit; }

   public long getSize() { return size; }

   public void setEarn(long e) { totEarn = e; }

   public long getEarn() { return totEarn; }

   public void setR(long rr) { r = rr; }

   public long getR() { return r; }

}

In this solution, I realized that we should just think of the line as a cyclic graph component (or, equivalently, a circular linked-list) and think of the roller coaster car as moving around the cycle. Eventually, I thought, we’re going to start someplace we started before. And if we do it once, we’ll probably do it again and again. But we already know how much money will be earned by that! So, the code first begins executing normally, but dropping “markers” every time it starts somewhere. If it sees that it has already start here, we have just found ourselves a cycle! The additional fields in the Group class just store information about how many rides there are in a cycle and how much money is made in a cycle. From this information, we can jump ahead many, many simulations of the roller coaster until we have fewer rides remaining than the length of  a cycle. From this point on, the program executes in a very simple fashion to the end. A slightly better approach would be to store the amount made in each Group element and so compute the remaining amount in constant time. But nevertheless, I was happy with this solution. The solution, on a 2.53 GHz Core 2 Duo processor, ran in 0.442s.

And that concludes this Google Code Jam solution series.

 
Leave a comment

Posted by on May 13, 2010 in Uncategorized

 

Tags: , , , ,

Google Code Jam 2010 Qualifying Round Solutions (Part 2)

Here’s my solution to Fair Warning Large input. It’s actually very similar to my Small input solution except that longs are replaced with BigIntegers (and it’s designed to handle cases larger than 3. Anyway, here’s my solution:

import java.util.*;
import java.io.*;
import java.math.*;

public class Fair_large
{
   String filename;
   ArrayList<BigInteger> list;
   ArrayList<BigInteger> diffs;

   public Fair_large(String f)
   {
      filename = f;
      openFiles();
   }

   private void openFiles()
   {
      Scanner s;

      try
      {
         s = new Scanner(new File(filename));
         int ct;

         ct = s.nextInt();

         for (int i = 0; i < ct; i++)
         {
            list = new ArrayList<BigInteger>();
            diffs = new ArrayList<BigInteger>();
            int ct2 = s.nextInt();

            for (int j = 0; j < ct2; j++)
               list.add(new BigInteger(s.next()));

            output(i + 1, process(ct2));
         }

         s.close();
      }
      catch (Exception e)
      {
         System.err.println("I/O error: " + e);
         System.exit(0);
      }
   }

   private String process(int length)
   {
      Collections.sort(list);
      BigInteger T = BigInteger.ZERO;

      if (length == 2)
      {
         T = list.get(1).subtract(list.get(0));

         if (T.equals(BigInteger.ONE))
            return "0";

         return distToNextMultiple(list.get(0), T).toString();
      }

      for (int i = 1; i < list.size(); i++)
         diffs.add(list.get(i).subtract(list.get(i - 1)));

      while (diffs.size() > 1)
         diffs.add(0, diffs.remove(0).gcd(diffs.remove(0)));

      T = diffs.get(0);

      if (T.equals(BigInteger.ONE))
         return "0";

      return distToNextMultiple(list.get(0), T).toString();
   }

   private BigInteger distToNextMultiple(BigInteger x,
                                         BigInteger T)
   {
      return T.subtract(x.mod(T)).mod(T);
   }

   private void output(int caseNum, String result)
   {
      System.out.println("Case #" + caseNum + ": " + result);
   }

   public static void main(String args[])
   {
      Fair_large app = new Fair_large(args[0]);
   }
}

So in this solution, I did use some simple optimizations to reduce the amount of actual computation needed to produce my solution. First, I handled the 2 case by itself since you don’t actually need to use GCD for that. I created an auxiliary array called diffs that holds the differences between consecutive numbers (in the sorted array). Thus, we know that adding some y \geq 0 does not change the differences between them; so, if they must all be multiples of T, then their differences must already be multiples of T. Then, the GCD of the all the numbers of the auxiliary array is calculated. In my small input solution I wrote a slightly optimized (non-recursive – to save on mem read/writes on the stack) GCD function based on Euclid’s algorithm. At some point I also used Stein’s algorithm for the GCD calculation. But for the large input, I decided to just use the built-in GCD function that comes with BigInteger.

One key property that I exploited was that:

\gcd(x_1, x_2, \ldots, x_n) = \gcd(\cdots\gcd(\gcd(x_1, x_2), x_3)\cdots)

Thus, I computed the GCD of two consecutive numbers (the first two of the aux array), removed both, and added in their GCD. Continuing in this way until there was only one element left, I had computed the GCD of all the numbers with reasonable speed. The other interesting function was the distToNextMultiple function.

Mathematically, this function returns:

(T-(x \mod T))\mod T

We know that if x < T then x \equiv x \mod T but otherwise, it is the remainder after division by T. Subtracting this value from T gives how much is left until the next multiple. But why the extra modulus T? If x \equiv 0 \mod T, then T - (x \mod T) yields T , which is wrong. Thus, the extra modulus T takes care of this case.

And that’s pretty much it. A pretty simple solution that works quickly. On the same Core 2 Duo machine, the solution for the large input runs in 0.436s. Good enough for me.

 
1 Comment

Posted by on May 13, 2010 in Uncategorized

 

Tags: , , , ,

Google Code Jam 2010 Qualifying Round Solutions (Part 1)

Okay so now I’ll post some of my solutions for Google Code Jam Qualifying Round 2010. The first one was called the Snapper problem. You can read the description of the problem here.

This solution was a very, very simple brute force solution. It worked well for the small input but obviously did not work for the large input. Anyway, here it is:

import java.util.*;
import java.io.*;

public class Snapper
{
   String filename;

   public Snapper(String f)
   {
      filename = f;
      openFiles();
   }

   private void openFiles()
   {
      Scanner s;

      try
      {
         s = new Scanner(new File(filename));
         int ct;

         ct = s.nextInt();

         for (int i = 0; i < ct; i++)
            output(i + 1, process(s.nextInt(), s.nextInt()));

         s.close();
      }
      catch (Exception e)
      {
         System.err.println("I/O error: " + e);
         System.exit(0);
      }
   }

   private String process(int N, int K)
   {
      ArrayList<Snap> list = new ArrayList<Snap>();

      for (int i = 0; i < N; i++)
         list.add(new Snap());

      // first one has power
      list.get(0).flipPower();

      // snap, snap, snap yo fingers!
      for (int i = 0; i < K; i++)
      {
         // if it has power, flip its state
         for (int j = 0; j < N; j++)
         {
            if (list.get(j).getPower())
               list.get(j).flipState();
         }

         int j = 1;

         // turn on power of all those whose prev's
         // state & power is on
         for (; j < N; j++)
         {
            if (list.get(j - 1).getState() &&
                list.get(j - 1).getPower())
               list.get(j).onPower();
            else
               break;
         }

         // j is the index of the first snapper whose prev's
         // state is off
         // turn rest's power off
         for (; j < N; j++)
            list.get(j).offPower();
      }

      // check if all powers and all states are on.
      int i = 0;

      for (; i < N; i++)
      {
         if (!list.get(i).getState() || !list.get(i).getPower())
            break;
      }

      return (i == N) ? "ON" : "OFF";
   }

   private void output(int caseNum, String result)
   {
      System.out.println("Case #" + caseNum + ": " + result);
   }

   public static void main(String args[])
   {
      Snapper app = new Snapper(args[0]);
   }
}

class Snap
{
   private boolean state;
   private boolean power;

   public Snap()
   {
      state = false;
      power = false;
   }

   public void  flipState() { state = !state; }

   public void flipPower() { power = !power; }

   public boolean getState() { return state; }

   public boolean getPower() { return power; }

   public void offPower() { power = false; }

   public void onPower() { power = true; }

}

I know that it’s super long and unnecessarily OOP’d, but it was my first solution so I figured it was okay. The solution (for small input) ran in 0.647s on an Intel Core 2 Duo 2.53 GHz, 4GB RAM, running Ubuntu 10.04.

The approach is pretty much self-explanatory so I won’t go into the details of it. It’s also sorta commented so you can figure out what I did. Obviously, I could have made it more efficient but it got me the answer quickly anyway so life’s good.

 
2 Comments

Posted by on May 13, 2010 in Uncategorized

 

Tags: , , , ,