Longest Common Subsequence (LeetCode #1143)
This article is part of a series from Nathan Thomas, a full stack software engineer working in San Francisco, CA. Other recent articles from him include Building Your Own Bitcoin Node and House Robber II.
Introduction
If you’re looking for a quick walkthrough of an optimal solution to the LeetCode Longest Common Subsequence problem, you’re in the exact right place.
This is a question in the Blind 75 LeetCode code challenge list, a group of questions put together by a tech lead at Facebook that’s been touted as a great way to prep for interviews.
Don’t beat yourself up too much if you’re here because it gave you a hard time. Everyone has a hard time with coding challenges sometimes.
Let’s get it started!
Understanding the Problem 🕵🏻♂️
The first thing we should always try to do is understand all the rules for our challenge before writing a single line of code (see Polya’s Problem Solving Techniques):
- We’ll be given two strings,
text1
andtext2
- We must return the length of the longest common subsequence between them
- If there is no common subsequence, return
0
- Each of the strings will have a length of
1 <= len(string) <= 1000
- The strings will consist of only English lowercase characters
The problem defines a common subsequence as “a subsequence that is common to both strings” like:
text1 = "abcde", text2 = "ace"common subsequence = "ace"
Note that the common subsequence letters do not need to be immediately sequential in both strings.
I’ll be using Python for the answer for this question since it’s a bit like a universal language — whether you prefer JavaScript, Java, Golang, C, or something else, everyone can “kind of” read Python.
We’ll also be jumping straight into an “optimal” solution. If you’re anything like me, you already attempted this problem before searching for an answer. I’m going to make sure you get a good one.
Sound good? Okay. Let’s get into it.
Setup ⚙️
This is another one of the dynamic programming problems that we’ll be solving together. However, this is the first one that we’ve discussed that uses a 2D matrix in order to produce a solid solution. These are always interesting and not incredibly intuitive unless you spend some time really mulling over how they work (which can help you with future problems in this category).
The first thing we want to do is to create a 2D list matrix
variable:
Whoa! Hold up. What’s going on here?
I’m glad you asked.
We’re essentially building a matrix
where the y
or height is the length of text1
(plus 1
) and the x
or width is the length of text2
(plus 1
).
First off, why are we adding 1
to the lengths?
What we’re doing here is providing a base case to use for our dynamic programming. We’re going to be adding up values, but we have to start from somewhere, right?
If we didn’t add an extra + 1
to the height and width of our matrix
, this is what it would look like:
text1 ="abc"
text2 = "agbfc"matrix = [
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
]
However, this only leaves us room for our calculations (this part will all make more sense in the next section). We need a “base case” to begin our calculations off of. If we add an extra + 1
to the height and width of the matrix
, this is what it will look like:
text1 ="abc"
text2 = "agbfc"matrix = [
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
]
Just think of it as extending the matrix
a bit with an extra column and row.
Okay, now that our matrix
is set up, let’s get to actually doing some calculations on it!
Building Our Loops 🔁
The first thing we want to do is to build two nested loops. One will iterate over our y
values, and one will iterate over our x
values.
Here’s what that looks like in code form:
Okay, hold up again. Why do we have all the -1
values here?
So the range(len(text1) — 1, -1, -1)
is Python’s way of writing, “Start from the last index of text1
, subtract 1
each loop iteration, and continue until we hit the number greater than -1
(which would be 0
).
We’re doing this syntax for both our y
and x
values (text1
and text2
), so we’re essentially starting from the position here with the big X and then working backwards:
text1 ="abc"
text2 = "agbfc"matrix = [
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, X, 0],
[0, 0, 0, 0, 0, 0],
]
We’re setting up our loops to iterate backwards from here but we have an extra row and column. Can you see how we now have values below and to the right of our starting location? If we didn’t set up our matrix
with an extra row and column, here’s what that same matrix
would look like:
text1 ="abc"
text2 = "agbfc"matrix = [
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, X],
]
From here, we need to do two things every time our inner loop runs.
First, we need to check if text1
at y
index is equal to text2
at x
index. If it is, we need to set the current matrix[y][x]
position equal to 1
plus whatever the lower right diagonal matrix
value is.
See why we needed an extra row/column for our dynamic programing matrix
?
This is what that looks like in code form:
This case above that we just wrote means that we successfully found a matching character and are essentially adding 1
to that index and tallying up the immediate lower right diagonal value with it.
Let’s keep going and hopefully this will all make more sense in just a minute!
If we don’t find matching characters, we need to set our current matrix location equal to the maximum of either the position immediately below or to the right of our current one.
This is what that looks like in code form:
To make this obvious what is happening, let’s say we start with the same matrix we’ve been using as a demonstration so far:
text1 ="abc"
text2 = "agbfc"matrix = [
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, X, 0],
[0, 0, 0, 0, 0, 0],
]
When we start out (where X is), we’re comparing the last index of each of our words. They match (the last character is c
), so we set the value to:
1 + matrix[y + 1][x + 1]
This ends up looking like this:
text1 ="abc"
text2 = "agbfc"matrix = [
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, X, 1, 0],
[0, 0, 0, 0, 0, 0],
]
Notice that we’ve moved to backwards to the next x
position thanks to our reverse iterating loops. For this position, the characters of f
and c
do not match. Because of that, we set the current position in our matrix equal to the maximum of the bottom and right positions.
This is what that looks like:
text1 ="abc"
text2 = "agbfc"matrix = [
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, X, 1, 1, 0],
[0, 0, 0, 0, 0, 0],
]
I’m going to go ahead and finish up this row since none of the other characters in text2
will match the last character of text1
:
text1 ="abc"
text2 = "agbfc"matrix = [
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, X, 0],
[1, 1, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0],
]
Okay, now this is where things get really special. This current position where X is runs and compares c
from text2
against b
from text1
. Those are not equal, so we set the value as the maximum of either the bottom or right.
The same is true for the next position where we compare f
and b
.
This is what our matrix looks like right now:
text1 ="abc"
text2 = "agbfc"matrix = [
[0, 0, 0, 0, 0, 0],
[0, 0, X, 1, 1, 0],
[1, 1, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0],
]
However, now we just hit a case where we’ve once again found a matching character. The position marked by X is comparing b
from text2
against b
from text1
. When we hit this, we add one to the bottom-right diagonal value in the matrix.
This is what we come up with:
text1 ="abc"
text2 = "agbfc"matrix = [
[0, 0, 0, 0, 0, 0],
[0, X, 2, 1, 1, 0],
[1, 1, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0],
]
This is the power of dynamic programming. We’re allowing the computer to dynamically calculate the answer for us according to our rules.
I won’t go through the rest of the positions (you can walkthrough the rest if you want before continuing), but here’s what the finished matrix
would look like:
matrix = [
[3, 2, 2, 1, 1, 0],
[2, 2, 2, 1, 1, 0],
[1, 1, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0],
]
Notice that the last position (at matrix[0][0]
) has a matching character again, so we end up with a final value of 3
.
As it so happens, that’s the actual correct longest common subsequence as abc
can be found within agbfc
.
All we need to do is return the matrix[0][0]
value:
There you go. That’s it. That’s the solution, and now you understand why it works!
This will run in a time complexity of O(n * m)
where n = len(text1)
and m = len(text2)
. The space complexity will also be O(n * m)
as we’re creating a matrix based on the length of these two strings.
Conclusion
BOOM. Nice work. You made it, and I hope this solution helped you out.
Keep an eye out for the rest of these Blind 75 coding challenges articles. It’s going to be a fun ride.
Feel free to reach out (using the links below) if you want to talk about coding, cool tech, or anything else (I really like book recommendations).
Thanks for reading. 🔥
Nathan
(GitHub, LinkedIn, Twitter, and Personal Site)