Components

The popularity analysis relies on a lot of components

StorageUsageDB

Despite its name, that is where both the StorageUsageAgent and the PopularityAgent stores their data. It is exposed via the StorageUsageHandler and the DataUsageHandler

StorageUsageAgent

This agent scans the DFC and stores the size and number of files per directory and per StorageElement in the StorageUsageDB.

StorageHistoryAgent

This agent crawls the StorageUsageDB, convert each directory into a bookkeeping path and fill in the following accounting:
  • Storage: space used/free per storage and/or directory
  • Data storage: spaced used per bookkeeping path
  • user storage: like Storage, but for user directories

DataUsageHandler

This service is called by the jobs to declare their use of a given directory. It is stored per directory and per day.

PopularityAgent

This agent goes through the StorageUsageDB and creates accounting entries for the popularity. It also caches the BK dictionary for each directory in the StorageUSageDB.

DataPop server

Yandex provided service that consumes our popularity CSV and make prediction on which dataset to remove. It is ran on our mesos cluster: https://lbmesosms02.cern.ch/marathon/ui/#/apps/%2Fdatapopserv

PopularityAnalysisAgent

This agents creates two files:
  • one CSV containing a summary of the popularity (see popularity.csv file ).
  • one CSV, generated from the first one through the DataPop server