#BookScraperExcel
This project is a web scraper built in Python that is capable of scraping book data from Amazon.com. It uses the requests, beautifulsoup4, and pandas libraries, as well as the random and time modules, to make HTTP requests, parse HTML content, and export the data to an Excel file.
The web scraper is enhanced with the ability to handle blocks and use multiple proxies, which makes it more reliable and efficient. It also sets up an application programming interface (API) using the API class, which allows users to retrieve the book data in a convenient format (as a list of dictionaries), and export the data to an Excel file for further analysis or use.
- Clone the repository:
git clone https://github.com/GitProSolutions/BookScraperExcel.git. - Install the required dependencies by running
pip install -r requirements.txtin the project directory. - Run the
main.pyfile in a Python environment with the required dependencies installed.
- In the
main.pyfile, modify theBASE_URLvariable to the Amazon search results page of your choice. - Modify the
num_pagesvariable to the number of pages of results you want to scrape. - If desired, modify the
API_KEYandAPI_SECRETvariables to your own values. - Run the
main.pyfile and wait for the book data to be scraped and exported to an Excel file. - To use the API, make a GET request to the following URL:
http://localhost:5000/api/v1/books?key=YOUR_API_KEY. ReplaceYOUR_API_KEYwith theAPI_KEYvariable value in themain.pyfile.
This project was built by GitProSolutions as a learning exercise in web scraping, API development, and Python programming.
This project is licensed under the MIT License. See the LICENSE file for more details.