Clone Graph (LeetCode #133)
This article is part of a series from Nathan Thomas, a full stack software engineer working in San Francisco, CA. Other recent articles from him include Building Your Own Bitcoin Node and Merge Intervals.
Introduction
If you’re looking for a quick walkthrough of an optimal solution to the LeetCode Clone Graph problem, you’re in the right spot.
This is a question in the Blind 75 LeetCode code challenge list, a group of questions put together by a tech lead at Facebook that’s been touted as a great way to prep for interviews.
Don’t beat yourself up too much if you’re here because it gave you a hard time. Everyone has a hard time with coding challenges sometimes.
Let’s get it started!
Understanding the Problem 🕵🏻♂️
The first thing we should always try to do is understand all the rules for our challenge before writing a single line of code (see Polya’s Problem Solving Techniques):
- We’ll be given a reference as an argument to a root node for a connected undirected graph
- We need to return a deep copy clone of the entire graph
- Each node in the graph contains a value as an
integer
and a list of other nodes that are neighbors - We’ll have between
0
and100
nodes in the connected undirected graph - The values in the nodes will be
1 <= node.val <= 100
- The value for each node will be unique relative to all the other nodes
- There are no repeated edges in the graph and no self-loops either
- We can visit all nodes in the graph starting from the given root node passed in as an argument
I’ll be using Python for the answer for this question since it’s a bit like a universal language — whether you prefer JavaScript, Java, Golang, C, or something else, everyone can “kind of” read Python.
We’ll also be jumping straight into an “optimal” solution. If you’re anything like me, you already attempted this problem before searching for an answer. I’m going to make sure you get a good one.
With all of that being said, let’s get to it!
Setup 🏗
This problem, in my opinion, boils down to being an exercise in the idea of using object references when assigning variables.
You’ll see exactly why in just a second.
The first thing we need to do is setup up an initial copy of the root node along with a cache and a stack. Again, you’ll see exactly why in just a second.
Notice that we have a class of Node
here. This was provided to us by LeetCode, but we’ll need it to complete this problem (and it also gives us a reference of what the input graph nodes look like).
First, we return immediately if the value of the node
input into our function is None
.
If that’s not the case, we should create a new root
node which is a copy of the initial node we’re given — or rather, just a copy of the value in it. We’ll pass in an empty list for the neighbors
list and copy it later (you’ll see why soon).
Next, we want to track any nodes that we have visited in our cache
. We’ll start out with just a reference to the new root
node that we’ve created.
Finally, we create a new stack
variable to track all the neighbor sets that we still need to recreate and visit. We can build this by storing a list where the first value is the node value
and the second value is the neighbors
list associated with that value.
See what we’ve done here? We’ve effectively split up copying our nodes (by storing a reference to the newly-copied node in cache
) and the copying of our neighbors (by preserving the work that needs to be done copying them via using a stack
).
Okay, let’s keep going!
Building Our Loop 🔁
Now that we have a cached reference where we’ll store every node (keyed by the value of that node, which is fortunately unique) and a worker stack where we keep references of [node.val, node.neighbors]
that we need to iterate over and copy, we can get down to business.
First, we need to build a loop that runs for as long as the stack
has length. Inside of that loop, we’ll pop off the top of the stack
and get the value and neighbors from it:
The next thing we need to do is create a new node. We can do that by initializing a Node
class (which was provided to us) and pass in the value
along with an empty list:
Whoa. Wait. How come we’re also looking in the cache
for value
?
Well, we might have previously created this node in a previous iteration of our loop. If we did, we already put it into our cache
. If that’s the case, we want to harness that and assign that node as the value. We’re pulling out that reference pointer to the cache[value]
and using that instead.
Nice.
The next thing we need to do in our loop to look at every neighbor for the neighbors
list of the current node we’re viewing.
We can build a simple loop that will iterate over them and, for each one, create an initial node:
The next thing we need to do is check if we already created this neighbor as a node in another iteration of our loop. Just like before, we’re going to dip into our cache
to check if we have saved a reference there.
Look familiar? We’re ultimately just saving references in our cache
and using it later on to fetch the right node reference if it already exists and share it amongst our other neighbors
lists.
If it does exist in the cache
, we should immediately overwrite our new new_neighbors
variable with it.
But if the value does not exist in the cache
, this means it’s the first time we’ve looked at that node. If that’s the case, we should immediately store the node in the cache
for other iterations to use later:
What’s that last line doing?
You see, we set the new_neighbor
list of neighbors
as an empty list. We now need to add a “job” to the stack
so that, in a depth-first search manner, we can pop that value off the stack
and do exactly what we’re doing here for that node’s neighbors
list.
Therefore, we can append [neighbor.val, neighbor.neighbors]
to our stack
.
Does that make sense?
The last thing we need to do here is append the reference to this new_neighbor
node to the new_node
's neighbors
list:
BOOM. As you can see, we can merely return the root
node that we created at the start of all of this once the while
loop is finished running and iteratively creating nodes.
This entire algorithm will run in a time complexity of O(n + m)
where n
is the number of nodes and m
is the the maximum number of neighbors references that any single node has (which could be as many as n
in the worst case). It has a space complexity of O(n + m)
. You could probably just shorten this to O(n)
where n
is whichever is greater between the total number of graph nodes or number of neighbors.
Conclusion
Nice job on making it to the end! I hope this explanation helped you out a bit.
Keep an eye out for the rest of these Blind 75 coding challenges articles. It’s going to be a fun ride.
Feel free to reach out (using the links below) if you want to talk about coding, cool tech, or anything else (I really like book recommendations).
Thanks for reading. 🔥
Nathan
(GitHub, LinkedIn, Twitter, and Personal Site)