This project aggregates public data from various websites, providing a centralized source of information for easy access and analysis.
How it works?
In project we have two main components:
1. Client: This part is responsible for preparing and configuring requests.
It defines what kind of request is needed (e.g., HTTP, HTTPS, controlled browser request, or WebSocket) and
sets any specific options, such as headers, proxies, payload, or connection parameters. However, the client itself does not execute the request—it just prepares it.
2. Server: This component handles the actual execution of requests configured by the client.
It takes the request details from the client and then performs the request, whether it's a simple HTTP/HTTPS call, a controlled browser interaction or a WebSocket connection.
Why Docker?
Docker streamlines project deployment by ensuring consistency across environments.
With Docker, we bundle the application along with all its dependencies so there are no unexpected issues with package versions or system configurations.
This setup makes it easy to upload and run the project reliably.
Docker’s lightweight containers allow for straightforward scaling—if we need more processing power, we can simply deploy the container to a more powerful machine or scale up with additional containers.
Why NextJS?
With Next.js, we can visualize task statuses dynamically, displaying real-time updates directly in the user interface.
Editing task options like: Start/Stop the task. Edit the seasons etc..
Why Redis?
Redis is ideal for managing proxy statuses because it's an in-memory database, offering high-speed access to frequently updated data.
By centralizing proxy information in Redis, we ensure that all scrapers pull from the same, up-to-date set of proxies, making it easier to manage and monitor proxy usage
across different scraping tasks. This setup helps streamline resource sharing and improves scraping operations.
Why MongoDB?
When storing collected data, the choice of database depends significantly on how frequently the data is updated. We can store to any DB we want.