Popularity scanning¶
This file is produced by the PopularityAgent and stored on its work directory.
- To produce the popularity.csv file, the scanning follows this algorithm:
- list all the directories of the DFC
- Convert each visible directory into a BK dictionary, getting the information from the StorageUsage (if it is cached), or from the BK itself
- For each directory:
- Get the number of PFNs and size per SE from the StorageUsageDB
- Get the day by day usage of the directories and group them by week from the DataUsage (which is the StorageUsageDB..)
- Sum the size and the number of files per:
- directory
- storage type (Archive, tape, disk)
- StorageElement
- Site
- Assigns the dataset to a given storage type (see bellow)
popularity.csv file¶
This file is the output of all the processing chain. The fields are the following:
- Name: full Bookkeeping path (e.g /LHCb/Collision11/Beam3500GeV-VeloClosed-MagDown/RealData/Reco14/Stripping20r1/90000000/SEMILEPTONIC.DST)
- Configuration: Configuration part, so DataType + Activity (/LHCb/Collision11)
- ProcessingPass: guess… (/RealData/Reco14/Stripping20r1)
- FileType: guess again (SEMILEPTONIC.DST)
- Type: a number depending on the type of data:
- 0: MC
- 1: Real Data
- 2: Dev simulation
- 3: upgrade simulation
- Creation-week: “week number” when this file was created (see bellow for details) (e.g. 104680)
- NbLFN: number of LFNs in the dataset
- LFNSize: size of the dataset in TB
- NbDisk: number of replicas on disk. Careful: if all LFNs have two replicas, you will have NbDisk=2*NbLFN.
- DiskSize: effective size on disk of the dataset in TB (also related to the number of replicas)
- NbTape: number of replicas on tape, which is not archive
- TapeSize: effective size on tape of the dataset in TB.
- NbArchived: number of replicas on Archive storage.
- ArchivedSize: effective size on Archive storage in TB
- CERN, CNAF,.. (all T1 sites): disk space used at the various sites.
- NbReplicas: average number of replicas on disk (NbDisk/LFN)
- NbArchReps: average nymber of replicas on Archive (NbArchived/LFN)
- Storage: one of the following:
- Active: If the production creating this file is either idle, completed or active
- Archived: if there are on disk copies, only archive
- Tape: if dataset is on RAW or RDST
- Disk: otherwise
- FirstUsage: first time the dataset was used (n “week number”)
- LastUsage: last time the dataset was used (in “week number”)
- Now: current week number
- 1, 2, etc: number of access since k weeks ago. Note that these numbers are cumulative, that means that what was accessed 1 week ago is also counted in what was included 2 weeks ago.
Week number¶
This allows to have an easy way to compare the age of datasets. It is defined as year * 52 + week number