Scraper CLI Project
M*A*S*H Scraper Project
Project Parameters:
Build a CLI program using Ruby that utilizes website scraping or an API
This was the first project we had to do, the first time we would utilize the things we had learned so far.
The project goal was to use scraping or an API to gather information from a website of our choice and then use that information tp build a CLI(Command Line Interface) program.
The first thing I had to learn was how to use a gem called "Nokogiri" to scrap the website. Nokogiri creates special nodes that are sorted into arrays that make it so they can be accessed and iterate over. The result being the information you wanted. This presented my first big challenge: CSS selectors.
To filter through these Nokogiri nodes you use CSS selectors. To determine what selectors to use you have to use a combination of the pry method and inspecting the code of the website. I had difficulty at first with this as none of the selectors I was trying to use was returning the information I wanted. I discovered though, that if you right-click on an element in the inspect panel of the website, then go to "Copy" there is a "Copy Selector" option. While this option can sometimes be too specific it gives you a much better idea of what selectors you need versus just trying to guess at it.
Once you have that done you have to iterate over the information because it will often contain special characters, code specific tags and so on. One of the easiest ways to do this to first convert it to text using .text. Once you have the string (text) then you need to iterate. In my case I had a few special characters such as "\n"(new line). To do this I used .gsub. This allows you to pass in a condition that the .gsub iterator will remove from the string. I found the most efficient way to remove multiple different items from the string was to pass an array with the parts I wanted to remove using regex syntax then since .gsub actually replaces instead of removes, I replaced each one with nothing using "".
Once that was done building the CLI was just a matter of building it out and plugging in the scraped data as needed.
Once that was done building the CLI was just a matter of building it out and plugging in the scraped data as needed.
Comments
Post a Comment