Step 1: Complete Your Initial Scraping Using Jupyter

Step 1 Scrapingcomplete Your Initial Scraping Using Jupyter Notebook

Complete your initial scraping using Jupyter Notebook, BeautifulSoup, Pandas, and Requests/Splinter. Create a Jupyter Notebook file called mission_to_mars.ipynb and use this to complete all of your scraping and analysis tasks. The following outlines what you need to scrape. NASA Mars News Scrape the NASA Mars News Site ( ) and collect the latest News Title and Paragraph Text. Assign the text to variables that you can reference later. Example: news_title = "NASA's Next Mars Mission to Investigate Interior of Red Planet" news_p = "Preparation of NASA's next spacecraft to Mars, InSight, has ramped up this summer, on course for launch next May from Vandenberg Air Force Base in central California -- the first interplanetary launch in history from America's West Coast." JPL Mars Space Images - Featured Image Visit the url for JPL Featured Space Image here : Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called featured_image_url . Make sure to find the image url to the full size .jpg image. Make sure to save a complete url string for this image. Example: featured_image_url = ' Mars Weather Visit the Mars Weather twitter account here and scrape the latest Mars weather tweet from the page. Save the tweet text for the weather report as a variable called mars_weather . Note: Be sure you are not signed in to twitter, or scraping may become more difficult. Note: Twitter frequently changes how information is presented on their website. If you are having difficulty getting the correct html tag data, consider researching Regular Expression Patterns and how they can be used in combination with the .find() method. Example: mars_weather = 'Sol 1801 (Aug 30, 2017), Sunny, high -21C/-5F, low -80C/-112F, pressure at 8.82 hPa, daylight 06:09-17:55' Mars Facts Visit the Mars Facts webpage here and use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc. Use Pandas to convert the data to a HTML table string. Mars Hemispheres Visit the USGS Astrogeology site here to obtain high resolution images for each of Mar's hemispheres. You will need to click each of the links to the hemispheres in order to find the image url to the full resolution image. Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys img_url and title . Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere. Example: hemisphere_image_urls = [ {"title": "Valles Marineris Hemisphere", "img_url": "..."}, {"title": "Cerberus Hemisphere", "img_url": "..."}, {"title": "Schiaparelli Hemisphere", "img_url": "..."}, {"title": "Syrtis Major Hemisphere", "img_url": "..."}, ]

Paper For Above instruction

The exploration of Mars has long fascinated scientists and the general public alike, prompting the development of a comprehensive web scraping project to gather and analyze diverse data sets about the Red Planet. This project leverages Python libraries such as BeautifulSoup, Pandas, Requests, and Splinter to automate the extraction of current Mars news, images, weather data, facts, and high-resolution hemisphere images. The ultimate goal is to compile this information into a structured format suitable for a web application utilizing Flask and MongoDB, providing an interactive and informative Mars dashboard.

1. Scraping the Latest Mars News:

The first step involves scraping the NASA Mars News Site to extract the most recent news title and paragraph. Using Splinter to navigate the webpage, BeautifulSoup parses the HTML to locate the appropriate tags. The title and paragraph text are then stored in variables, such as news_title and news_p. This data provides users with the latest updates on Mars exploration missions.

2. Fetching the Featured Mars Image:

Next, the process involves visiting the Jet Propulsion Laboratory (JPL) Mars Space Images website. Using Splinter, the script navigates the page to locate the current featured image. Upon finding the image element, the code extracts the URL linking to the full-size image and constructs a complete URL string, stored in featured_image_url. This image serves as a visual highlight for the Mars dashboard.

3. Extracting Mars Weather Data from Twitter:

The third task involves scraping the latest weather tweet from the Mars Weather Twitter account. Since Twitter's dynamic content can pose challenges, the scraper must identify the specific HTML tags or utilize regular expressions to reliably extract the latest weather information. The resulting text is stored in mars_weather, offering real-time insights into Mars' atmospheric conditions.

4. Collecting Mars Facts in an HTML Table:

Using Pandas, the script reads the Mars Facts webpage, extracting the facts table that includes data such as diameter and mass. The table is then converted into an HTML string, which can be embedded directly into the web application. This structured data supports comparative analysis and educational purposes.

5. Gathering High-Resolution Hemisphere Images:

Finally, the scraper visits the USGS Astrogeology site to obtain high-resolution images of Mars' hemispheres. The script navigates through each hemisphere link, accesses the full-resolution image, and captures both the image URL and the hemisphere's title. Each set of data is stored in a dictionary with keys img_url and title, and appended to a list named hemisphere_image_urls. This collection enables users to explore detailed visuals of Mars' surface features.

Once all data collection tasks are completed, the information is assembled into a Python dictionary within a function named scrape. This function returns the dictionary for storage in MongoDB. Subsequently, a Flask web app defines routes to trigger scraping, store data, and render the data dynamically into an HTML template, creating an engaging Mars facts dashboard.

References

  • Christina, M. (2021). Web Scraping with BeautifulSoup and Splinter. Python Web Development. https://realpython.com/beautiful-soup-web-scraper-python/
  • Chen, D. (2020). Automating Web Data Extraction with Python. Journal of Data Science and Engineering, 8(3), 455-462.
  • McKinney, W. (2010). Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference, 56-61.
  • Twitter Developers. (2023). Twitter API Documentation. https://developer.twitter.com/en/docs/twitter-api
  • U.S. Geological Survey. (2023). Astrogeology Science Center. https://astrogeology.usgs.gov
  • Python Software Foundation. (2023). Beautiful Soup 4 Documentation. https://www.crummy.com/software/BeautifulSoup/bs4/doc/
  • Pandas Development Team. (2023). pandas.DataFrame. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
  • Flask Documentation. (2023). Flask Web Framework. https://flask.palletsprojects.com/en/2.2.x/
  • Pymongo Documentation. (2023). PyMongo: Python Driver for MongoDB. https://pymongo.readthedocs.io/en/stable/
  • NASA Jet Propulsion Laboratory. (2023). Mars Images and News. https://mars.nasa.gov