Published on Sep 03, 2023
We consider the problem of multiple lightweight devices monitoring unstructured data on the web. To do this effectively, we propose a 3-tiered architecture for a consolidated system that monitors changes in unstructured web data to cater to these clients (lightweight devices).
Typically, our clients work under constrained resources like memory, network bandwidth, processing power, etc.
This limits their ability to run applications that require timely and relevant web-data applications like news readers, offline/online search engines, directories, or general web-page monitoring tools.
The proposed system can deliver relevant and timely data to its clients by adapting to their profiles and usage statistics, which it uses to crawl and monitor the web effectively.
Existing literature covers numerical and textual updates in either the monitoring or the crawling context. However, there is little or no existing work on optimizing overall monitoring that includes both textual content as well as client side constraints. We formalize this overall optimization problem to some extent, and propose modules in each layer of our architecture that work together to minimize our objective.
This objective includes user satisfaction and resource costs. Our layered architecture includes modules to manage novelty detection, workload, monitoring and scheduling, client profiles, client usage statistics, data packaging and delivery, client connections, etc. We also propose ways to evaluate our architecture across a few variations in managing some of these modules like crawling, client connections, and delivered data size.