Traverse HTML DOM Tree

Photo by Jackson So on Unsplash

We need to figure out a way to traverse the DOM structure that we obtain from a single web page. The HTML can be represented in a tree structure where the children of the HTML tag become the children of a node in the tree. Each level of the tree can have any number of nodes depending upon the number of nested HTML tags. We need to traverse the nodes in the tree level by level, starting at the root node.

We’ll be provided with a root node of an n-ary tree. The root node in our case will be the body tag. This root node will have the complete web page nested within its children. We have to return the values of each levels’ nodes from left to right in separate subarrays. This way we can separately analyze their content for further data processing.

Let’s try to understand this better with an illustration:


Since we need to traverse all the nodes of each level before moving onto the next level, we can use the Breadth First Search (BFS) technique to solve this problem. We can use a queue to efficiently traverse in BFS fashion.

Let’s see how we might implement this functionality:

  1. Start by enqueuing the root node to the queue.
  2. Iterate until the queue is empty.
  3. During each iteration, first, count the elements in the queue (let’s call it queue_size). We will have this many nodes at the current level.
  4. Next, remove queue_size nodes from the queue and enqueue their value in a list to represent the current level.
  5. After removing each node from the queue, insert all of its children into the queue. If the queue is not empty, repeat from step 3 for the next level.

The following illustration might clarify this process.

Let’s look at the code for the solution:

Let n is the total number of nodes in the tree.

Time complexity

The time complexity of the above algorithm will be O(n). This is because we traverse each node once.

Space complexity

The space complexity of the above algorithm will be O(n) because we need to return a list containing the level order traversal. We will also need O(n) space for the queue. Since we can have a maximum of n — 1 node at any level (this can happen only at the lowest level), we will need O(n) space to store them in the queue.

Full Stack Programmer, love to solve problem’s during free time.