Wednesday, November 21, 2012

UVA 12532 - Interval Product

http://uva.onlinejudge.org/external/125/12532.html

You are given a sequence of \(N (1 \leqslant N \leqslant 10^{5})\) integers \(X_{1}, X_{2}, \cdots, X_{n}\). You have to perform \(K (1 \leqslant K \leqslant 10^{5})\) operations over this sequence, which can be:

(1) Change an arbitrary value of the sequence.

(2) Given a range \([i, j]\) return if the product \(X_{i}, X_{i+1}, ..., X_{j-1}, X_{j}\) is positive, negative or zero.

As usual we are going to start asking ourselves if it's possible to solve this problem under the 2 seconds time constraint using the brute force approach? The answer is "maybe" after the UVA administrator install the new Quantum Computers servers :), because this is not the case a Brute Force solution is going to timeout.

To solve this problem I am going to describe two approaches, the first one using Segment Tree and the second one using Binary Indexed Trees (BIT):

In case that the reader is not familiar with this data structures I personally recommend the following TopCoder tutorials:
  1. Binary Indexed Trees by boba5551
  2. Range Minimum Query and Lowest Common Ancestor by danielp

(1) Solution using Segment Tree


The main problem of the Brute Force approach is that we need \(O(j - i + 1)\) time to answer each of the second operation queries. Using Segment Trees we can reduce this time to \(O(log N)\) where \(N\) is the size of our sequence.

The main idea is to construct a Segment Tree of the sub-intervals products of our sequence, for example, given the sequence \((-2, 6, 0, -1)\) we got the following tree:

There is just a little issue with this, the values of the given sequence are in the range \(-100 \leqslant X_{i} \leqslant 100\). This implies that we are in risk to get an overflow/underflow for some sequences, an easy one is \((100,100,100,100,100)\), if we execute the second operation over range \([0, 4]\) we got the value \(100^{5}\) that clearly surpass the 32 bit integer capacity.

Let's remember that this problem care just for the sign (e.g. negative, positive) between certain interval \([i,j]\). So to avoid the overflow problem we simply need to substitute all the negative numbers with -1 and all the positives values by 1. After applying the previous operations we got a tree that look like this:


This solution will give us an overall time complexity of \(O(K logN)\) which is enough to pass the 2 seconds time limit.



(2) Solution using Binary Indexed Trees  


The solution with BIT is a little bit less intuitive than the previous exposed, but we can easily came with it by asking ourselves the correct questions.

1. When the product of an interval \([i,j]\) becomes zero?

When there is one or more zeros in the interval.

2. When the product of an interval \([i,j]\) becomes negative?

When there is not zeros in the interval, and the total number of negative number between \([i,j]\) is odd.

3. When the product of an interval \([i,j]\) becomes positive?

When none of the other conditions holds then product of the interval \([i,j]\) is positive.

After considering these three questions it seems important to keep track of the frequency of zeros and negative numbers for any interval \([i,j]\) of our sequence. The easiest way to accomplish this task is maintaining two arrays with the cumulative frequencies of both types, for example:



So if we want to know the numbers of zeros between certain interval \([i,j]\) we just need to apply the old trick of \(freq[j] - freq[i - 1]\) (assuming that \(freq[0]\) is 0).

\(\\ freq[4] - freq[1] = 1 \\ freq[4] = 2 \\ freq[1] = 1 \\ 2 - 1 = 1 \)

Until this point everything seems just fine, however there is another problem we haven't consider yet, what happen when we update one of the values of our sequence (operation #1) ? if that's the case we also need to update our cumulative frequency array. It's not hard to see that even when we can perform operation #2 in \(O(1)\) time, the operation #1 is going to take in the worst case scenario \(O(N)\) time, that due to the vast amount of queries we can't afford for this problem.

Considering this problems, the necessity of using BIT seems more evident. We just need to carefully maintain the cumulative frequency updated after each operation is performed. The overall time complexity of this solution is \(O(KlogN)\).

Friday, November 02, 2012

UVA 11076 - Add Again

http://uva.onlinejudge.org/external/110/11076.html

You are given an sequence of \(n (1\leqslant n\leqslant12)\) digits, your task is to return the summation of all the unique permutations of those digits.

For example given the digits \({1, 2, 3}\) the sum of all the permutations is \(1332\):

\(123 + 213 + 132 + 231 + 312 + 321 = 1332\)

In order to understand the solution of the problem the reader should be familiar with the idea of permutations with repetitions.

The first question we should ask ourselves, it's possible to solve this problem using the brute force approach under the 3 seconds time limit constraint? The answer is no, the reason is simple, we are given at most 12 digits and 20,000 test cases. In the worst case scenario we have \(20,000 \cdot 12! = 9,580,032,000,000\), that is a pretty big number.
 
After realized that brute force is not feasible we should take a more clever approach. To simplify things a little bit I am going to divide the solution in the following two cases:

(1) All the digits in the sequence are unique.

A really important observation is that all the digits appear the same number of time for any position. Let's consider the initial example where the digits are \({1, 2, 3}\):

All the permutations gave us the following set:

\(\left \{ 123, 213, 132, 231, 312, 321 \right \}\)

If we put attention we can see that all the digits appear the same number of times (2 times) in each of the positions hundreds, tens and ones. The following example mark with red the hundreds positions:

\(\left \{ {\color{Red}1}23, {\color{Red}1}32 \right \}, \left \{ {\color{Red}2}13, {\color{Red}2}31 \right \},\left \{ {\color{Red}3}12, {\color{Red}3}21 \right \}\)

Another important detail to notice is once you fixed some arbitrary digit in certain position (e.g. ones, tens, hundreds, ...), the number of times that digit is going to occupied that position is \((n - 1)!\), where \(n\) is the length of the target sequence. Now let's denote \(S_{A_{k}}\) as the sum of all the positions that the \(A_{k}\) digit can occupied in all the unique permutations of the sequence, for each of the digits we got the following:

To get our result we just need to add up the partials sum of each element in the sequence: Generalizing the results obtained in the previous steps we have: Do not get confused by the formula \(\frac{10^{n} - 1}{9}\) this formula gave us the elements in the sequence \(\{0,1,11,111, ...\}\) (you can check this sequence in the OEIS website) .

(2) There is one or more digits repeated in the sequence.

The only difference between this case and the previous one is that now we need to eliminate the repetitions. Let's denote \(f(A_{k})\) as the frequency of the digit \(A_{k}\) in the sequence \(A\). We got the following expression:
Notice that because we can have repetitions the total sum \(S\), is restricted just to the unique elements of the sequence \(A\), this is why we introduce a variable \(j\) where \(1 \leqslant j \leqslant n \). We can also observe that the cases where each digit appear only one time, all the frequencies are going to gave us a product of 1, this means that the first formula is just one special case of the second one, knowing this we can forget about the first one and generalize both cases with the current formula.

If we implement the ideas set out above we are going to end up with an algorithm with overall time complexity of \(O(n)\).