Colbert

Online Bad Event Reporting Tool

technologies

springboot springboot springboot springboot

OVERVIEW

COBERT is a an acronym for ONLINE BAD EVENT REPORTING TOOL. It is Suite of multiple tools which are used for collection, parsing, filtering and reporting of data from multiple sources. The three primary types of tools are.

  • COBERT
  • Twitter Tool
  • RSS Reader

COBERT is a tool which scrapes data from a very large number of websites around the world. It ranges from sites providing Siesimic Events around the world, News Stations, News Related to Ports, Highways, Police Departments of multiple countries, states and their counties to Weather Forecasting Websites. This is not an exhaustive list and but provides a glimpse of the varied types and diversity of websites that it scrapes data from. Once it collects data, it matches the various filters created by analysts to filter out data that is not relevant or uninteresting to analysts. And if it matches the filters set for different categories of sites, it sends out Emails to the Analysts. If there are any updates to the Site, then the differences are highlighted and updated information is also sent.

Twittertool functionality is similar to that of COBERT in terms of filtering and matching of relevant keywords from Twitter Feeds but it collects data from Twitter Feeds from relevant Twitter handles.

RSS Reader too is very similar in functionality to that of other tools in terms of Filtering and matching of data but it uses RSS Feeds of various sites which provide feed.

TECHNOLOGY

Since it is a suite of Tools, each Tool has been written in different frameworks and techonologies over several years. The primary Web Frameworks that have been used are Django and CodeIgniter. Beyond this, for scraping, BeautifulSoup, FuzzyWuzzy, SoupSieve and Selenium are used primarily. For matching and filtering apart from RegEx and FuzzyWuzzy a bunch of other techniques have been used which range from matching the Hashes, matching the Sentiments using , categorization of tweets using ScikitLearn. For fetching Tweets, Twitter's Streaming APIs have been used. And for parsing Rss Feeds, Python's FeedParser library has been used.

Apart from various libraries for matching and filtering, multiple Apps from Django Ecosystem have been used like django-guardian, django-filter, django-pagination, django-rest-framework, django-vanilla-views etc. Django's Admin feature has been used heavily for managing Sites, Scrapers, Users and Keywords.

For testing the code, we have used pytest, model-mommmy and python's unit tests' mocking features along with selenium throughout the code base.