How to Visually Understand Your Site Structure and Link Strength

We have all heard the saying ‘a picture is worth a thousand words’ and that wisdom also applies to your website. There are a number of reasons why you should visualize your sites structure, particularly for pages that acquire incoming links. Visual representations of data can be obtained via third-party applications which can assist you in identifying patterns within your site that may not be clearly evident through raw data or spreadsheets. These visuals will help give you a complete understanding of what is going on with your website structure.

As an internet marketing company, here are the steps we recommend for building a visual representation of your site as it relates to incoming links.

Step #1 Screaming Frog

Run Screaming Frog to gather all of your internal page data as well as the page link structure. You can download a free version of Screaming Frog that will analyze up to 500 URLs on a website. Once installed, crawl the website that you want to map. You can eliminate the metrics for images, CSS and Javascript and just focus on page URLs and subdomains. Once the report has been generated, clean up the data by:

  • Click the Bulk Export tab, Response Codes, Successes (1xx) Inlinks
  • Download the file and open
  • Delete the first column named ‘Type’
  • Delete the first row named ‘Inlinks’
  • Delete all columns other than ‘Source’ and ‘Destination’
  • Rename the ‘Destination’ Column ‘Target’
  • Save the edited file

You will be left with a file that contains the source and target columns. Perform a quick review to remove any anomalies, such as hashtags in the target column and delete.

Step #2 Gephi

Gephi is an open-source visualization tool that turns data into pictures to create a visual representation of your data pulled from Screaming Frog. Simply go to Gephi and import your file under the File -> Import tab. A new window will pop up and prompt you to select the ‘Edges Table’ option. (In Gephi, a Node is what represents your page and an Edge represents the links between the pages.)

A new window will display and make sure to check the box with the ‘Create Missing Nodes’ option. You will then be directed to an Overview tab and presented with a visual that will appear like a blurry black square. At the top of the page, click on the ‘Data Laboratory’ tab and export the Nodes. Once you open the file, there will be three columns: ID, Label and Time Set.

Add a fourth column to enter the referring domains which will be provided from a Search Console report.

Step #3 Search Console

To gather data on the number of external links for your pages, login to your Search Console account and click on the ‘Search Traffic’ tab and select ‘Links to Your Site.’ Then, navigate to the ‘Your Most Linked Content’ column and click ‘More’ then ‘Download This Table.’

Open the spreadsheet and insert a new column before the URL path and enter your domain. Double click the bottom right corner of the cell which contains your recently added domain and copy the formula to the bottom of the spreadsheet.

  • Select the data from the first two columns (domain and path) and copy to Notepad
  • Find and replace any ‘//” with “/”
  • Paste into the second column and delete the first column

Your list is now complete with the full URLs.

Step #4 Add to Gephi

In the fourth column you created in Gephi, add the referring domains from your Search Console report. Add a fifth column and name ‘Modularity_Class.’

Temporarily copy the data to a second spreadsheet and enter the following formula into cell D2:

=IFERROR(INDEX(‘search console’!$C$2:$C$136,MATCH(A2,’search console’!$A$2:$A$138,0),1),”0″)

Note that the number 138 is a variable that is based on the number of rows in your list and will need to be changed accordingly. Once complete, copy the referring domains column and use the paste values command to enter the data into your original spreadsheet under the  Referring Domains column. In the Modularity_Class column you created, you can assign different values to different page categories such as 1=Blog Posts, 2=Services, 3=Company Info, etc.

Save the file as Nodes.csv and import the spreadsheet into the current Gephi project you have open by using the Import Spreadsheet button on the Data Laboratory screen.

On the following screen, make sure that the referring domains and modularity_Class columns are set to ‘Float’ and uncheck the ‘force nodes to be created as new ones’ box. Click next and you will be taken to a data overview screen. Run the PageRank simulation and assign different colors to each of the modularity classes you created. Now, you will begin to see your visual image take shape with coloured dots, representing the Nodes, scattered throughout the original black square.

Your graph can be shaped by different variables, we recommend using the ‘referring domains’ attribute to highlight colour coded sections of your website. On each of your colour coded nodes, you can right click and view the individual page.

Essentially, this visual will give you the ability to quickly locate anomalies within your site and find out how certain pages are grouped to find opportunities to improve the flow of your internal link weight and PageRank.

This is just one way that you can visualize data from your website to help you optimize and improve your overall SEO rank. For more helpful tips on how you can learn to scale your business, download our free internet marketing eBook.