Components¶

The popularity analysis relies on a lot of components

StorageUsageDB¶

Despite its name, that is where both the StorageUsageAgent and the PopularityAgent stores their data. It is exposed via the StorageUsageHandler and the DataUsageHandler

StorageUsageAgent¶

This agent scans the DFC and stores the size and number of files per directory and per StorageElement in the StorageUsageDB.

StorageHistoryAgent¶

This agent crawls the StorageUsageDB, convert each directory into a bookkeeping path and fill in the following accounting:

Storage: space used/free per storage and/or directory
Data storage: spaced used per bookkeeping path
user storage: like Storage, but for user directories

DataUsageHandler¶

This service is called by the jobs to declare their use of a given directory. It is stored per directory and per day.

PopularityAgent¶

This agent goes through the StorageUsageDB and creates accounting entries for the popularity. It also caches the BK dictionary for each directory in the StorageUSageDB.

DataPop server¶

Yandex provided service that consumes our popularity CSV and make prediction on which dataset to remove. It is ran on our mesos cluster: https://lbmesosms02.cern.ch/marathon/ui/#/apps/%2Fdatapopserv

PopularityAnalysisAgent¶

This agents creates two files:

one CSV containing a summary of the popularity (see popularity.csv file ).
one CSV, generated from the first one through the DataPop server