Skip to content

digitalmethodsinitiative/4cat_web_studies_extensions

Repository files navigation

Web Studies a 4CAT Extension

Web Studies is a companion extension to the 4CAT Capture and Analysis Toolkit. It add functionality to 4CAT by utilizing Selenium along with a Firefox browser to collect data from web sources.

Features

New datasources

General web studies

  • Selenium URL Collector
    • Collect HTML, text, and links from a list of URLs
  • Web Archive Collector
  • Screenshot Generator
    • Take screenshots of web pages

App store studies

  • Apple Store
  • Google Store

Cloud app store studies

  • Microsoft Azure App Store
  • Amazon Web Services (AWS) Marketplace
    • Collect data on AWS applications

New analysis processors

  • Take screenshots of any column containing URLs
  • Detect trackers
    • Provide a list of various source code to search for in collected HTML

Installation

These extensions are designed to work with 4CAT v1.46 or later.

Docker installation

  1. Download/clone extensions into both 4CAT backend and frontend containers
  • docker exec 4cat_backend git clone https://github.com/digitalmethodsinitiative/4cat_web_studies_extensions.git extensions/web_studies/
  • docker exec 4cat_frontend git clone https://github.com/digitalmethodsinitiative/4cat_web_studies_extensions.git extensions/web_studies/
  1. Restart 4CAT containers
  • docker compose restart from 4CAT directory where docker-compose.yml and .env files were previously downloaded
  • This will automatically install necessary dependencies, Firefox, and Geckodriver
  1. Activate desired new datasources from the 4CAT Control Panel
  • Control Panel -> Settings -> Data sources

Direct/manual installation

  1. Download or clone this repository and copy the folders into the extensions folder in your 4CAT directory
  • git clone https://github.com/digitalmethodsinitiative/4cat_web_studies_extensions.git extensions/web_studies/
  1. Run 4CAT's migrate script to install necessary packages
  • python helper-scripts/migrate.py
  • Note: fourcat_insall.py is only designed to run on linux systems. For other systems you will need set up the following:
    • Install python packages from requirements.txt
    • Download Firefox
    • Download the appropriate Geckodriver compatible with that version of Firefox (https://github.com/mozilla/geckodriver/releases/)
    • Adjust settings in 4CAT interface via Control Panel -> Settings -> selenium to point to Firefox/Geckodriver programs
  1. Activate desired datasources from the 4CAT Control Panel
  • Control Panel -> Settings -> Data sources

About

Web Studies companion datasources and processors for 4CAT

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages