Shopee Code League 2022 Solution Writeup

    27 minute read    

Shopee Code League is a 2-week coding league consisting of two rounds of coding competitions and several workshops. I participated in Shopee Code League 2022 with two of my friends, who are also my colleagues (Joshua and Ren Xian). The coding league consists of the Qualification Round and Final Round. Only the top 100 teams from the Qualification Round can qualify for the Final Round.

Unlike the past year, this year’s Shopee Code League only has algorithmic competition, making it a pure competitive programming competition like Google Code Jam and Facebook Hacker Cup. Coupled with the super attractive prizes, I guess that successfully attracted many competitive programming experts like IOI medalists and ACM ICPC world finalists to the competition, which makes everything super competitive.

The Qualification round is held on the 19th of March 2022 and consists of 5 problems to be solved in 3 hours. Our team solved all problems in the qualification round and qualified for the finals with 44th place. The final round is held on the 26th of March 2022 and consists of 9 problems to be solved in 5 hours. Our team managed to solve 8 of the 9 problems in the final rounds and ranked in 62nd place.

It was a fun and challenging competition. However, several significant flaws in the competition affected the overall experience. The two of the most severe issues were the inferior competition platform and the clarity of the problem statement.

Shopee decided to use HackerEarth as the competition platform. To be honest, this is the worst ever platform I have ever encountered in my competitive programming career. It was believed to have over three thousand teams participated in the qualifications round. When the qualification round started, the HackerEarth server couldn’t handle the load of over 3000+ clients DDOS it simultaneously. As a result, I had experienced a more 10 minutes delay before I could successfully load the first problems.

Besides the delay at the beginning of the competition, the whole platform constantly lagged, and it took some time to load every page. Moreover, the scoreboard was not working, and some submissions took more than 5 minutes to judge. If I am not mistaken, the scoreboard took more than 60 hours to be fully updated and finalized 😆. In short, the platform is disastrous.

Also, this is a team competition, but they only allowed one account per team and one session per account to access the platform. Hence, we have to use some ways to share the problem statement and code. Our team decided to use a private GitHub repository to share the problem statement and our code.

Furthermore, most of the problem statements are poorly written, lack of clarity, lack of rigor, and some even come without an input constraint. That is unprofessional and gives out a lot of bad impressions to me 😕.

Overall, I still feel that it was a fun experience, and I like solving those problems. It reminds me of the competitive and nervous atmosphere of participating in an actual competitive programming competition. Just that it attracts too many competitive programming masters to the competition that makes it next to impossible for my team to win any prizes despite being able to solve almost all of the problems, we lost a lot in the time penalty.

Qualification P0 - Shopee Xpress Delivery

Problem Statement: pdf

This problem was the first problem I read, and I immediately jumped into solving it as I recognized it as a very straightforward BFS problem. However, due to some bugs in the implementation, I could not pass all the test cases on the first try. After debugging for quite some time, I decided to put this problem aside and look at other problems first.

Later on, I came back to this problem and realized that there might be a portal on the starting first cell, and my code will not consider that portal. After realizing that, I immediately made some minor modifications to my BFS code and successfully passed all the test cases.

Solution: BFS (Code)
Difficulty: Easy-Medium

The solution is a very straightforward BFS on the grid. We just need to be careful with the implementation and edge cases.

Qualification P1 - Installation of a Shopee Billboard

Problem Statement: pdf

This problem was solved by my teammate, but I do know the solution. At first, I noticed that this problem had missing input constraints. After I had thought of the problem for a while, I could not find any polynomial-time solution for this problem and hence skipped it and continued to other problems.

After that, my teammate just submitted his implementation, and his code was able to pass all test cases. It turns out that the intended solution is indeed exponential. It is just recursive backtracking with pruning.

Solution: Complete Search, Recursive Backtracking with Pruning (Code)
Difficulty: Easy-Medium

The solution is a very straightforward complete search, recursive backtracking with pruning. The version without proper pruning will not pass the time limit.

Qualification P2 - Fireworks Festival

Problem Statement: pdf

At first, when I was reading the problem statement, I had a hard time understanding the problem as the problem statement was extremely confusing to me. It took me quite some time to fully understand the problem.

After understanding the problem, I was sure that the solution was dp (dynamic programming). Just that if we did the dp without any optimization, the time complexity would be \(O(n^2m)\). I tried to use a priority queue to optimize the dp to become \(O(nm\log{n})\), but it still won’t pass the time limit. Finally, I modified my code to use the monotonic queue optimization, which reduced the time complexity to \(O(nm)\) and passed the time limit.

Solution: DP with Monotonic Queue Optimization (Code)
Difficulty: Medium-Hard

Before we start going into the solution, since the original problem statement is extremely confusing, let’s modify the problem statement and solve my version of the problem instead.

Modified Problem Statement:

Shopee will be hosting a fireworks festival in Singapore. There are a total of \(n\) cities in a straight line, and the distance between adjacent cities is \(1\).

There are \(m\) fireworks to be set off. The \(i\)-th firework will be set off at time \(t_i\) minutes in \(x_i\)-th city and rewarding \(c_i\) Shopee coins for each firework. If during the \(i\)-th firework set off and you are at \(y\)-th city, you will receive \(c_i - \lvert x_i-y \rvert\) amount of Shopee coins for watching that firework.

You cannot skip watching any fireworks, and it is possible to receive a negative amount of Shopee coins. It is also possible that two or more fireworks can be set off at the same time, and you will be watching both fireworks at the same time in some city.

What is the largest amount of Shopee coin you can obtain?

First of all, define the following function. Let

  • \(\text{cost}(i, j)\) be the number of coin you will receive if you watch the \(i\)-th fireworks at \(j\)-th city. So \(\text{cost}(i, j) = c_i - \lvert x_i - j \rvert\).

Then, let

\[\begin{align*} \text{dp}(i, j) = & \text{ maximum amount of coins you can receive for fireworks } 1 \text{ to } i \text{ }\\ & \text{ where you were at } j\text{-th } city \text{ during the } i\text{-th } firework \text{ set off} \end{align*}\]

So the answer that the probem requires is \(\text{max}(\text{dp}(m, 1),\, \text{dp}(m, 2),\, \ldots,\, \text{dp}(m, n))\).

The base case for the dp is

\[\text{dp}(1, i) = c_1\]

that is the maximum amount of coins rewards for the \(1\)-st firework since you can choose to start at any city to maximize your rewards.

Next is to determine how to compute \(\text{dp}(i, j)\) for the recurrence case. The naive and straightforward way is as follows

\[\text{dp}(i, j) = \text{cost}(i, j) + \text{max}(\text{dp}(i-1, y_1),\, \text{dp}(i-1, y_2),\, \ldots,\, \text{dp}(i-1, y_k))\]

where \(y_1,\, y_2,\, \ldots,\, y_k\) are the cities that are reachable from city \(j\) with the speed and time between fireworks \(i\) and \(i-1\), this can be computed with

\[\begin{gather*} \text{maxWalkDist} = (t_i - t_{i-1}) \times \text{speed} \\ \lvert j - y_1 \rvert,\, \lvert j - y_2 \rvert,\, \ldots,\, \lvert j - y_k \rvert \le \text{maxWalkDist} \end{gather*}\]

With this, the time complexity to compute a dp state is \(O(n)\), and there is \(n \times m\) number of dp states, so we now have a \(O(n^2m)\) solution.

Obviously, a \(O(n^2m)\) solution will not pass the time complexity. Hence we need to optimize the dp. To optimize the dp, we need the following observation.

  • \(y_1,\, y_2,\, \ldots,\, y_k\) are contiguous cities
  • when computing \(\text{dp}(i, j)\) and \(\text{dp}(i, j+1)\) they share a lot of similarity in terms of their reachable cities since the \(\text{maxWalkDist}\) will be the same
  • to be exact, there will be at most one additional reachable city and at most one fewer reachable city
  • WLOG, let \(y_1 < y_2 < \ldots < y_k\)
  • when comparing city \(j+1\) with city \(j\), the additional reachable city (if exist) will be \(y_k + 1\)
  • the unreachable city (if applicable) will be \(y_1\) (note that it depends on the situation, sometimes \(y_1\) will be reachable from both city \(j\) and \(j+1\))

More particulary, when computing \(\text{dp}(i, 1),\, \text{dp}(i, 2),\, \ldots,\, \text{dp}(i, n)\), we are actually finding the sliding window maximum. It is known that sliding window maximum can be solved in linear time with monotonic queue, so we can use monotonic queue to optimize the dp state computation. With monotonic queue optimization, \(\text{dp}(i, 1),\, \text{dp}(i, 2),\, \ldots,\, \text{dp}(i, n)\) can be computed in \(O(n)\) time and hence, the final time complexity is \(O(nm)\).

Qualification P3 - Connecting the Numbers

Problem Statement: pdf

This problem is one of the hardest problems in the competition. Fortunately, I was able to get some insight into solving this problem very fast. I was very confident that my approach was correct, and hence I jumped into implementing it. After like 20 to 30 minutes of implementation and debugging, I was able to pass all the test cases.

Now here comes the interesting part, after the competition, there was a discussion thread on hackerearth about this problem, saying that the model solution was wrong. If the model solution was wrong, that means my approach was wrong too. After running my code on the counterexample in the discussion thread, I confirmed that my solution was wrong. It really hurts me because I was sooooo confident that my original approach was brilliant and correct. I was so happy after my solution got accepted, but now it turns out that it is the wrong approach.

Luckily, the model solution of the problem setter was as wrong as mine, and hence our team got the points for this problem. So I guess we should feel happy about that too.

I will describe my “incorrect” solution that passed all the test cases below. Regarding the “correct” solution, I just know that this problem can be reduced to a 2-SAT problem where the boolean variables are whether each edge should be placed inside or outside the circle, and the clauses are each pair of intersecting edges. As for the implementation, although I know that 2-SAT is solvable in linear time in terms of the number of clauses, I don’t really know the exact algorithm and implementation for this problem.

Solution: Ad-Hod, Observation (Code)
Difficulty: Hard

The idea to solve this problem is that we need to find a valid drawing for the circle. If we found one valid drawing, we output yes, otherwise, we output no. So the hard part is to figure out how to get the valid drawing if it exists.

These are all the key observations that will help in deriving the solution.

  1. There are only two ways to draw each connecting line, either draw them inside or outside of the circle.
  2. If there exists a valid drawing without intersection, then flipping all the lines from inside to outside and outside to inside is another valid drawing.
  3. Inside and outside drawings are independent of each other, so we can draw all the outside lines first and then draw the inside lines.
  4. When you draw a line on one side, you split the circle into two parts. If there is another line required to cross the two parts, it has to be on a different side.

<TODO: insert image for observation 1, 2, 3, and 4 here>

So now, because of 3, we can focus on drawing the lines outside first and then draw the remaining lines inside. Then, because of 2, WLOG, we can draw the first lines outside. And because of 4, the circle is now split into two parts after drawing the first line. Then, we can repeat the process recursively for each part, find the next line that is within the same part, and then connect them to the same side again until there are no more lines that can be connected within the same part.

Because of 3, we can draw the line on the outside using the process described above first, then repeat the same process on the inside. If there are still lines that are not connected after the two processes, then there is no valid drawing.

<TODO: insert image for the process>

The incorrect part
The key reason that I am so confident that my approach is correct is that I thought, “this seems to be the only way to draw the lines, there is no other way” because everything seems to be deterministic and follows the observation. So if everything is deterministic, there is only one single way to draw the lines, my approach has considered every possible situation, and hence it must be correct. But that is not the case, and there is actually a degree of freedom hidden somewhere.

Notice the process below.

Then, we can repeat the process recursively for each part, find the next line that is within the same part, and then connect them to the same side again until there are no more lines that can be connected within the same part.

In the process above, we find the next line that is within the part and draw it on the same side. If this is the first-ever line, then it had no problem. That is completely fine. But if that is the second line, then it had the degree of freedom to draw it on the other side of the circle. So my approach is actually a greedy approach which greedily group the next line to be on the same side as the first line, but that is not necessarily needed to be on the same side in the final valid drawing.

Because of that, my approach is incorrect as I did not explore all possible ways of drawing. Below is a counterexample that will cause my approach to fail.

<TODO: insert image for the counterexample>

Qualification P4 - Money Transfer

Problem Statement: pdf

This problem is the easiest problem in the competition. However, I did not solve it first because I thought I could solve P0 and P3 first. It was a mistake not to solve this problem first because it will significantly increase our team’s time penalty.

After getting stuck in P0 and P3 with some bugs, I took 5 minutes to quickly solve this and submit it to prevent further increasing the time penalty.

Solution: Simulation, Hashmap (Code)
Difficulty: Trivial

The solution is just to simulate the problem scenario.

Final P0 - Stringing Strings

Problem Statement: pdf

My teammate solved this problem during the competition, but I do know the solution. Straightforward recursive backtracking will work.

Solution: Complete Search with Recursive Backtracking (Code)
Difficulty: Easy-Medium

We just need to make sure that during the backtracking or DFS, we prioritize the lower indexed potential pieces first so that our final answer has the smallest lexicographic order.

We can also use rolling hash to optimize the string comparison, but it is not needed to pass the time limit.

Final P1 - Can You Bounce?

Problem Statement: pdf

At first, I had no idea how to solve this problem, but my teammate mentioned that this seemed to be solvable with the binary lifting technique. After I understand the problem, it is indeed solvable with the binary lifting technique.

My teammate proposed to try on the solution implementation first, but he had bugs in his code. Later on, while he was having a hard time debugging his implementation, I also tried to implement a working solution. Luckily, my code worked and passed all the test cases.

Solution: Binary Lifting (Code)
Difficulty: Hard

First of all, we noticed that the grid is small (\(\text{row}, \text{col} \le 8\)), so there is at most \(64\) cells.


\[\begin{align*} \text{ways}(c_{pq},\, c_{xy},\, k) = & \text{ number of ways for the pawn piece to travel } \\ & \text{ from cell}(p, q) \text{ to cell}(x, y) \text{ in } k \text{ moves } \end{align*}\]

Now here is the core idea, notice that we can have the following equations.

\[\begin{align*} \text{ways}(c_{pq},\, c_{xy},\, a+b) &= \text{ways}(c_{pq},\, c_{11},\, a) \times \text{ways}(c_{11},\, c_{xy},\, b) \\ &+ \text{ways}(c_{pq},\, c_{12},\, a) \times \text{ways}(c_{12},\, c_{xy},\, b) \\ &\ \ \vdots \\ &+ \text{ways}(c_{pq},\, c_{1n},\, a) \times \text{ways}(c_{1n},\, c_{xy},\, b) \\ &+ \text{ways}(c_{pq},\, c_{21},\, a) \times \text{ways}(c_{21},\, c_{xy},\, b) \\ &+ \text{ways}(c_{pq},\, c_{22},\, a) \times \text{ways}(c_{22},\, c_{xy},\, b) \\ &\ \ \vdots \\ &+ \text{ways}(c_{pq},\, c_{nn},\, a) \times \text{ways}(c_{nn},\, c_{xy},\, b) \\ \text{ways}(c_{pq},\, c_{xy},\, a+b) &= \sum_{i, j\ \in\ \text{all cells}}{\text{ways}(c_{pq},\, c_{ij},\, a) \times \text{ways}(c_{ij},\, c_{xy},\, b)} \end{align*}\]

The above equations can be interpreted as

the number of ways to travel from \(c_{pq}\) to \(c_{xy}\) can be computed from the number of ways to travel from \(c_{pq}\) via a middle cell then to \(c_{xy}\)

that means that if we know the value of \(\text{ways}(c_{pq},\, c_{xy},\, a)\) and \(\text{ways}(c_{pq},\, c_{xy},\, b)\) for all \(p, q, x, y\), we can compute \(\text{ways}(c_{pq},\, c_{xy},\, a+b)\) too, using the equation above.

So now, the binary lifting technique comes into play. Traditionally, the binary lifting technique is used in the LCA algorithm to find the \(n\)-th parent of any node in \(O(\log{n})\) time.

The idea of the binary lifting technique is that we store the solution for 1, 2, 4, 8, 16, and until the maximum power of 2 available. Then, to get the solution for arbitrary number \(n\), we just need to get the binary representation of \(n\) and “add” up all the 1-bits.

In this problem, we store all the values for \(\text{ways}(c_{pq},\, c_{xy},\, 2^a)\) for all \(p, q, x, y,\) and \(a \le 60\) (\(60\) is enough since input constraint is \(10^{18} < 2^{60}\)). That is possible as there will be at most \(8^4 \times 60 \approx 250\text{k}\) values to compute and store which is within the time and memory limits.

The base case is \(\text{ways}(c_{pq},\, c_{xy},\, 1)\) where we just need to simulate the process. For other power of 2, we can use the equation above to compute,

\[\begin{align*} \text{ways}(c_{pq},\, c_{xy},\, 2^{a+1}) &= \text{ways}(c_{pq},\, c_{xy},\, 2^a + 2^a) \\ &= \sum_{i, j}{\text{ways}(c_{pq},\, c_{ij},\, 2^a) \times \text{ways}(c_{ij},\, c_{xy},\, 2^a)} \end{align*}\]

The time complexity to compute the value for \(\text{ways}(c_{pq},\, c_{xy},\, 2^a)\) for all \(p, q, x, y,\) and \(a \le \log{k}\) is \(O(n^4 \cdot n^2 \cdot \log{k}) = O(n^6 \log{k})\). We then store all these values in an array as part of the precomputation process and prepare them for answering each query later.

Note: To better demonstrate the process of computing the solution for each query, we use color coding to determine which value is precomputed and which is not. Note that all the precomputed value are in the form of \(\textcolor{violet}{\text{ways}(c_{pq},\, c_{xy},\, 2^a)}\).

To answer the query, note that the starting cell and ending cell is fixed, let the starting cell be \(c_u\) and ending cell be \(c_v\). Then, to compute \(\text{ways}(c_u,\, c_v,\, k)\), we can use the binary lifting technique for that. For example, to compute the solution when \(k = 22 \ (10110_2)\). We first need to compute \(\text{ways}(c_u,\, c_{xy},\, 110_2)\) for all \(x, y\).

\[\textcolor{lightseagreen}{\text{ways}(c_u,\, c_{xy},\, 110_2)} = \sum_{i, j}{\textcolor{violet}{\text{ways}(c_u,\, c_{ij},\, 10_2)} \times \textcolor{violet}{\text{ways}(c_{ij},\, c_{xy},\, 100_2)}}\]


\[\text{ways}(c_u,\, c_{xy},\, 10110_2) = \sum_{i, j}{\textcolor{lightseagreen}{\text{ways}(c_u,\, c_{ij},\, 110_2)} \times \textcolor{violet}{\text{ways}(c_{ij},\, c_{xy},\, 10000_2)}}\]

after we have \(\text{ways}(c_u,\, c_{xy},\, 10110_2)\) we just output the solution for the cell that \(c_{xy} = c_v\).

For the time complexity, there is \(O(n^2)\) possible value of \(x, y\) and we need to repeat the “addition” process at most \(\log{k}\) time, so the time complexity per query is \(O(n^2 \cdot n^2 \cdot \log{k}) = O(n^4 \log{k})\).

Final P2 - Reviewing Reviews

Problem Statement: pdf

I immediately noticed that I could use rolling hash to solve this problem as it is almost similar to one problem I solved recently, Image Convolution of IEEEXtreme 15.0.

Despite knowing the solution, it still took me some time to implement the rolling hash algorithm (yes, my coding speed is slow), but it got accepted after 20 minutes of coding.

Solution: Rolling Hash (Code)
Difficulty: Medium

The variable in the original problem statement is a little confusing, so let’s declare our own variables. Let

  • \(m \times n\) be the size of the original image where \(m\) is the number of rows and \(n\) is the number of columns
  • \(x \times y\) be the size of the search image where \(x\) is the number of rows and \(y\) is the number of columns

Since rolling hash works in a single dimension, we need to repeat the rolling hash in both directions to get the hash of an image.

In the first dimension, for each row of the original image, we perform the rolling hash horizontally with a window size of \(y\) and store all the hash of all valid windows in another array. The resulting hash array should have a size of \((m) \times (n - y + 1)\).

<TODO: insert image for the resulting hash here>

Then, we repeat the rolling hash process vertically in the other dimension. For each column of the hash array that we obtained from the previous hashing process, we perform the rolling hash with a window size of \(x\) and store all the hash of all valid windows in another array. The resulting hash array should have a size of \((m - x + 1) \times (n - y + 1)\).

<TODO: insert image for the resulting hash here>

We repeat the 2-dimensions rolling hash process on the search image, and we should get a single hash value. Then, we just compare the hash value with each hash in the \((m - x + 1) \times (n - y + 1)\) array and output the number of matching hash.

The total time complexity and space complexity are linear to the size of input, \(O(nm)\).

Final P3 - Ice Cream Shop

Problem Statement: pdf

Solution: Not yet solved but probably maths.
Difficulty: Very Hard

Final P4 - Magic Cables

Problem Statement: pdf

This is an interesting problem. During the contest, I saw many teams solve this problem through the scoreboard. So I thought the solution should be not so hard and thought a naive MST should be able to solve the problem, despite knowing that the naive MST complexity is \(O(n^2 \log{n})\). Unfortunately, it turns out that a naive MST algorithm will not pass the time limit, and I had actually wasted some time coding Prim’s algorithm for this.

After I realized that the naive MST algorithm would not works, I went back to the beginning and started to draw observation. With this in mind, I managed to come up with a solution quite fast and coded it out in 5 min. The solution is actually not that hard.

Solution: Math, Ad-Hoc (Code)
Difficulty: Medium

By making several key observations, the solution will automatically appear.

  1. All the even numbers can connect to number 1 with 0 costs.
  2. For all the odd numbers, if you invert all the bits of that number, you will get a smaller even number. Then that odd number can be connected to that even number with a cost of 0.
  3. For 2., there is only a single exception: the number with all bits is 1, i.e., the number in the form of \(2^k - 1\). This is because the server with ID 0 does not exist, and we cannot use the strategy above. Instead, we can connect to the next number with a cost of 0.
  4. So, if the next number doesn’t exist, we are forced to connect this number with number 1, which will have the minimum possible cost, 1.

So now, the solution is obvious, if the total number of servers is NOT in the form of \(2^k - 1\), output 0, or else, output 1.

Final P5 - Rewards

Problem Statement: pdf

I immediately noticed that this problem could be solved with greedy algorithm after reading the problem. However, when I was done reading the problem statement, one of my teammates was already trying on this problem. So during the competition, I told her the idea to solve, and she managed to submit a working solution during the competition.

Solution: Greedy (Code)
Difficulty: Easy-Medium

The key idea to solve this problem is to work backward, year by year. Instead of start considering the project with the earliest deadline, we start considering the project with the latest deadline.

When working backward year by year, let’s consider only the years before the deadline of the last project since there will not be any more projects that can be chosen after that.

The advantage of working backward is that we can store the projects that are after the current year in a list, and when considering which project to work on this year, we can pick the project that has the highest reward in the list. This is the optimal choice to make because when working backward, all possible projects that we can work on are inside the list, and among all the possible projects, we want to work on the project that returns the highest rewards.

Of course, since we are looking for the highest value in a list, we don’t want to use any simple list. Instead, we should use a priority queue to optimize that process. Also, do note that there might be some years when we have no projects to work on, and remember to use 64-bit integer for this problem.

Final P6 - SeaMoney Network Security

Problem Statement: pdf

The original problem statement for this problem is buggy, extremely confusing, and hard to understand. It took me some time to fully understand the problem statement, and because the statement is buggy, I had to ask for clarification and wait for a reply.

After understanding the problem, it is actually a very simple connected component in a grid problem.

Solution: Graph Traversal, DFS (Code)
Difficulty: Medium

Observe the following.

  1. If a connected component has one initially infected node, the whole component will be infected.
  2. Since only one initially infected node can be removed, if a connected component has more than one initially infected node, the whole component will be infected no matter what.
  3. If a connected component has one initially infected node, removing that initially infected node will prevent the whole component from being infected.

So now, the solution is clear, run a DFS to find the size of the connected component and the number of initially infected nodes in each of the components. Then among all the connected components that have only one initially infected node, find the largest component in size and the ID of the initially infected node in that component. If there is a tie in size, get the one with the smaller ID.

If there is no connected component with only one initial infected node, output 0 (the first infected node) since you cannot reduce any infected machine when the infection ends.

Final P7 - Split the Bill

Problem Statement: pdf

I actually spend quite some time on this problem. At first, I wasted some time because I misunderstood the problem, I thought that the person who would pay the bill was to be determined by us dynamically.

After I reread the problem statement a few more times, I realized that the person that would pay for the bills is actually fixed and pre-determined. Even after realizing that, I proceed with the wrong approach.

My original idea for the problem is that for every triplet of person A, B, C, if A owes B and B owes C, then we can “redirect” the debt to become A owe C (see diagram below). We can keep performing the “redirect” until that for each person A, either A did not owe someone money, or no one owes A money. This is possible because if it is not the case, then we can perform a “redirect” using A as the pivot to reduce the total debt.

<TODO: insert image for the "redirect" process>

I coded that approach and kept getting the wrong answer verdict. Later on, I figured out that this approach has a major flaw and is indeed a wrong solution. For example, it will fail on the following debt graph.

<TODO: insert image for the counterexample>

After I found the counterexample, I immediately realized that the solution might be min-cost flow. So I used my min-cost flow template in here and got accepted after that.

Fast forward until now, when I am writing this writeup, I thought of the problem again. Then I realize that the straightforward traditional min-cost flow implementation might not be the correct solution since the cost definition in this problem is slightly different. But since my unmodified min-cost flow implementation passes all the test cases, I will assume that the solution is correct 😆.

Solution: Min Cost Flow (Code)
Difficulty: Medium-Hard


  • \(\text{debt}(a, b)\) be the total amount of money that \(a\) owes \(b\)

Now consider the following function,

\[\text{nett}(a) = \sum_{i\ \in\ \text{everyone}}{\text{debt}(a, i)} - \sum_{i\ \in\ \text{everyone}}{\text{debt}(i, a)}\]

\(\sum{\text{debt}(a, i)}\) can be interpreted as the total money that \(a\) owes everyone else and \(\sum{\text{debt}(i, a)}\) can be interpreted as the total money that everyone else owe \(a\). So \(\text{nett}(a)\) can be interpreted as the total amount of money that \(a\) should pay to resolve \(a\)’s debt.

Instead of considering who owes who, and who shall pay who, let’s assume that there is a moderator among them. So if \(\text{nett}(a)\) is positive, then \(a\) will need to pay the moderator \(\text{nett}(a)\) amount of money. And if \(\text{nett}(a)\) is negative, then \(a\) will receive \(-\text{nett}(a)\) amount of money from the moderator. Then, everyone is happy, and all debt is resolved.

But in this problem, there is no moderator available, so instead, we need to find the optimal way to pay \(\text{nett}(a)\) amount to the network (or receive from the network if \(\text{nett}(a)\) is negative). This is exactly what min-cost flow can solve.

We model the problem like the following flow graph.

<TODO: insert image for the flow graph here, in the meantime, read the code to understand how the flow graph looks>

Then the answer is just the min-cost flow* of the above flow graph.

*note: to avoid confusing people, the cost required in this problem is not the same as the cost in the traditional min-cost flow algorithm, hence I need to modify this part in my solution to fit the cost definition in this problem to solve this problem, see the difference in min-cost flow template and my solution for this problem. And remember that this might not be the correct solution, but it somehow works.

I kind of have the actual correct solution in my mind when I am writing this, but it will be even more confusing to explain that. So let’s just keep it this way.

Final P8 - Shopee Event

Problem Statement: pdf

I figured out the solution 5 min after I understood the problem. At first glance, we just need to find the size of each connected component, and we can do some simple DFS graph traversal for that.

But that will not work because it is a graph with multiple cliques and extremely many edges. The number of edges in the graph is \(O(n^2)\). So if we naive-ly modeled the graph and tried to run some graph traversal algorithm to find the connected component, it will surely get a time limit exceeded verdict.

But since we only want to find the connected component size, we can use a disjoint set for that and skip the graph traversal.

Solution: Disjoint Set (Code)
Difficulty: Easy-Medium

For each interest group, we select the first person to be the representative. Then for the remaining person, just perform a union operation with the representative using a disjoint set.

In the end, we just query the component size from the disjoint set and output them.

Since all disjoint set operations are amortized constant*, the total time complexity is linear to the input size and the total number of person \(O(n + k)\).

* well, in the context of competitive programming, we usually just pretend that the inverse Ackermann function is constant 😆