Popularity scanning¶

This file is produced by the PopularityAgent and stored on its work directory.

To produce the popularity.csv file, the scanning follows this algorithm:

list all the directories of the DFC
Convert each visible directory into a BK dictionary, getting the information from the StorageUsage (if it is cached), or from the BK itself
For each directory:
- Get the number of PFNs and size per SE from the StorageUsageDB
- Get the day by day usage of the directories and group them by week from the DataUsage (which is the StorageUsageDB..)
Sum the size and the number of files per:
- directory
- storage type (Archive, tape, disk)
- StorageElement
- Site
Assigns the dataset to a given storage type (see bellow)

popularity.csv file¶

This file is the output of all the processing chain. The fields are the following:

Name: full Bookkeeping path (e.g /LHCb/Collision11/Beam3500GeV-VeloClosed-MagDown/RealData/Reco14/Stripping20r1/90000000/SEMILEPTONIC.DST)

Configuration: Configuration part, so DataType + Activity (/LHCb/Collision11)

ProcessingPass: guess… (/RealData/Reco14/Stripping20r1)

FileType: guess again (SEMILEPTONIC.DST)

Type: a number depending on the type of data:

0: MC

1: Real Data

2: Dev simulation

3: upgrade simulation

Creation-week: “week number” when this file was created (see bellow for details) (e.g. 104680)

NbLFN: number of LFNs in the dataset

LFNSize: size of the dataset in TB

NbDisk: number of replicas on disk. Careful: if all LFNs have two replicas, you will have NbDisk=2*NbLFN.

DiskSize: effective size on disk of the dataset in TB (also related to the number of replicas)

NbTape: number of replicas on tape, which is not archive

TapeSize: effective size on tape of the dataset in TB.

NbArchived: number of replicas on Archive storage.

ArchivedSize: effective size on Archive storage in TB

CERN, CNAF,.. (all T1 sites): disk space used at the various sites.

NbReplicas: average number of replicas on disk (NbDisk/LFN)

NbArchReps: average nymber of replicas on Archive (NbArchived/LFN)

Storage: one of the following:

Active: If the production creating this file is either idle, completed or active

Archived: if there are on disk copies, only archive

Tape: if dataset is on RAW or RDST

Disk: otherwise

FirstUsage: first time the dataset was used (n “week number”)

LastUsage: last time the dataset was used (in “week number”)

Now: current week number

1, 2, etc: number of access since k weeks ago. Note that these numbers are cumulative, that means that what was accessed 1 week ago is also counted in what was included 2 weeks ago.

Week number¶

This allows to have an easy way to compare the age of datasets. It is defined as year * 52 + week number