Career-guider : A Flask app with real-time Webscraper.

Rajat Mudaliar
5 min readApr 6, 2021

This blog is explanation of web app that I made and deployed on heroku platform that scrapes, analyses and suggests you which skills you need to learn next according to the trend.

Frameworks, tools and languages used are Python,Flask, Selenium, Beautifulsoup, Jinja2 and Bootstrap.

The Steps to create this app is as follows:

1. Writing a python Function using Selenium and Beautifulsoup to scrape the job portal.
2. Using scraped information to visualize the data.
3. Creating a flask app.
4. Common Base template for all pages of our website.
5. Designing home page with a form.
6. Creating a result page that visualizes and creates a table of data.

1. Writing a python Function using Selenium and Beautifulsoup to scrape the job portal.

For the app to work we need data based on the form details. Also to keep the data upto the current trend, I had to create a live webscraper that works after the form is submitted and then give analysis of it. So to do this I used selenium and beautifulsoup framework to automate the task of browsing the job portal website and scrape all the listings.The Github repo for all codes- https://github.com/rajat-mudaliar/Career-Guider.git

Python function to browse and scrape data into list.

This function takes input as the job profile and location given by the user through the webpage form. For using selenium you must have a driver in your system that helps to run browser(chrome, firefox). By using the driver we open the job portal and enter the Job profile and location using xpath of the form. Then I have created a loop that was used to scrape a page and then find next button and go to next page. The website I scraped consisted of 20 job listing/page. Wait element is used to wait till the page is completely loaded. I read on many website and also compared my runtime of code and found that Beautifulsoup is way more faster than selenium in scraping the webpage, selenium is best for automation. So I scraped the job description, company name, skills required and link to apply for that job and created a dataframe using all these data. I used Action chain to scroll and navigate to find the next button as selenium only can read elements present on the current screen. The function is continued in the next codeblock.

2. Using scraped information to visualize the data.

The data was stored in lists. For analysing the skills, I flattened the list of list to a single list with all lower case to remove repetation. Then using nltk library, I got the frequency of every skill required. Suppose you are looking for data engineer, then the frequency plot shows that python and big data is mentioned most number of times in all the Job listing. Now using matplotlib I plotted the frequency bar plot for the user to visualize and understand the skills that are most in demand. I had to deploy the app on heroku and hence could not save the plot as an image file(.jpeg,.png) in the folder. So I had to use IO byte with encoding and stored the plot in RAM buffer which was later passed to the result webpage to be displayed to the user.

3. Creating a flask app.

The project was light , precise and only needed few webpages, I choose flask framework instead of Django. First I created an app and added two pages(home for hosting form and display to show result). The home page simply renders the webpage to the client. The disp function pulls the form details into python and then passes this data to the function that we had earlier created. Then the output of the function(encoded image and the scraped lists) are rendered to the display webpage. Debug=True was used for testing purpose to avoid repeatedly refreshing the page and view changes instantly.

4. Common Base template for all pages of our website.

Base template created using bootstrap along with Jinja2.

To make the task easier and get uniformity in all pages, I used Jinja2 for all the webpage format. The webpage was designed using Bootstrap which was easy to use and already has basic template and forms sample readily available. We block the space for title and content that will change in every page.

The template that all web pages will follow.

5. Designing home page with a form.

Homepage code

Extend base.html retrieves the base template and reduces the recoding of this page. Then using bootstrap I created a form with a textbox and form select and a trigger button which was connected to the webscraper and display webpage.

Homepage.

6. Creating a result page that visualizes and creates a table of data.

I first extended the base page and then using the encoded image I displayed it. Then using the zip data, I created a table showing all the details regarding the job along with a link to directly apply for the job.

Output Plot
Output Table with job description, company name, skills required and link to apply.

This project was deployed on Heroku cloud.

https://career-guider.herokuapp.com/

You can follow me on

--

--