Data distribution¶
Policy¶
Archive¶
The default option is at Operations/<Setup>/TransformationPlugins/ArchiveSEs. it can be overwritten in each plugin. The choice is done randomly.
DST broadcast¶
The broadcast done by LHCbDSTBroadcast plugin is done according to the free space
RAW files processing and distribution¶
The RAW files all have a copy at CERN, and are then distributed across the Tier1. The processing is shared between CERN and the Tier1.
The selection of the site for copying the data and the site where the data will be processed (so called RunDestination) is done by the RAWReplication plugin. To do so, it uses shares that are defined in Operations/<Setup>/Shares
Selection of a Tier1 for the data distribution¶
The quota are defined in Operations/<Setup>/Shares/RAW.
Since CERN has a copy of every file, it does not appear in the quota.
In practice, the absolute values are meaningless, what matters is their relative values. The total is normalized to a 100 in the code.
When choosing where a run will be copied, we look at the current status of the distribution, based on the run duration. The site which is the furthest from its objectives is selected.
Selection of a Tier1 for the data processing¶
Once a Tier1 has been selected to copy the RAW file, one needs to select a site where the data will be processed: either CERN or the Tier1 where the data is: the RunDestination. Note that the destination is chosen per Run, and will stay as is: all the production will process the run at the same location.
This is done using Operations/<Setup>/Shares/CPUforRAW. There, the values are independent: they should be between 0 and 1, and represents the fraction of data it will process compared to CERN. So if the value is 0.8, it means 80% of the data copied to that site will be processed at that site, and the 20 other percent at CERN.
This share is used by the processing plugin DataProcessing. The equivalent exists when reprocessing (plugin DataReprocessing): Operations/<Setup>/Shares/CPUforReprocessing
MonteCarlo distribution¶
The distribution of MC relies on the LHCbMCDstBroadcast plugin. In order to know what to replicate, we use a wildcard in the bookkeeping query, and for each of the individual path, we start a transformation. To be sure not to start several time the same, we use the –Unique option.
The difficulty is to know for which year and which sim version to start. Gloria or Vladimir can tell you…
In order to list the BK paths that are going to be replicated:
for year in 2011 2012 2015 2016;
do
for sim in Sim09b Sim09c;
do
dirac-dms-add-transformation --List --BK=/MC/$year//$sim/...Reco...;
done;
done
List of processing passes for BK path /MC/2011//Sim09b/...Reco...
/Sim09b/Reco14c
/Sim09b/Reco14c/Stripping21r1NoPrescalingFlagged
/Sim09b/Trig0x40760037/Reco14c
/Sim09b/Trig0x40760037/Reco14c/Stripping20r1Filtered
/Sim09b/Trig0x40760037/Reco14c/Stripping20r1NoPrescalingFlagged
/Sim09b/Trig0x40760037/Reco14c/Stripping21r1Filtered
/Sim09b/Trig0x40760037/Reco14c/Stripping21r1NoPrescalingFlagged
/Sim09b/Trig0x40760037/Reco14c/Stripping21r1p1Filtered
/Sim09b/Trig0x40760037/Reco14c/Stripping21r1p1NoPrescalingFlagged
In order to actually start these replications:
for year in 2011 2012 2015 2016;
do
for sim in Sim09b Sim09c;
do
dirac-dms-add-transformation --Plugin LHCbMCDSTBroadcastRandom --BK=/MC/$year//$sim/...Reco... --Unique --Start;
done;
done
Standing Transformations¶
it is useful to have some transformations always at hand where you can just add a few files. Here are a few:
# Replicate files to freezer
dirac-dms-add-transformation --Plugin ReplicateDataset --Destination CERN-FREEZER-EOS --Name 'Replicate-to-Freezer' --Force
# Replicate to local buffer
dirac-dms-add-transformation --Plugin=ReplicateToLocalSE --Dest=Tier1-Buffer --Name 'Replicate-to-local-Buffer' --Force
# Replicate to RAW
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=CERN-RAW --Name 'Replicate-to-CERN-RAW' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=GRIDKA-RAW --Name 'Replicate-to-GRIDKA-RAW' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=RAL-RAW --Name 'Replicate-to-RAL-RAW' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=IN2P3-RAW --Name 'Replicate-to-IN2P3-RAW' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=CNAF-RAW --Name 'Replicate-to-CNAF-RAW' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=RRCKI-RAW --Name 'Replicate-to-RRCKI-RAW' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=PIC-RAW --Name 'Replicate-to-PIC-RAW' --Force
# Replicate to DST
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=CERN-DST-EOS --Name 'Replicate-to-CERN-DST-EOS' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=GRIDKA-DST --Name 'Replicate-to-GRIDKA-DST' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=RAL-DST --Name 'Replicate-to-RAL-DST' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=IN2P3-DST --Name 'Replicate-to-IN2P3-DST' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=CNAF-DST --Name 'Replicate-to-CNAF-DST' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=RRCKI-DST --Name 'Replicate-to-RRCKI-DST' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=PIC-DST --Name 'Replicate-to-PIC-DST' --Force
# Replicate to MC-DST
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=CERN_MC-DST-EOS --Name 'Replicate-to-CERN_MC-DST-EOS' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=GRIDKA_MC-DST --Name 'Replicate-to-GRIDKA_MC-DST' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=RAL_MC-DST --Name 'Replicate-to-RAL_MC-DST' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=IN2P3_MC-DST --Name 'Replicate-to-IN2P3_MC-DST' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=CNAF_MC-DST --Name 'Replicate-to-CNAF_MC-DST' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=RRCKI_MC-DST --Name 'Replicate-to-RRCKI_MC-DST' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=PIC_MC-DST --Name 'Replicate-to-PIC_MC-DST' --Force
# To reduce number of replicas, based on data popularity
dirac-dms-add-transformation --Plugin ReduceReplicas --Number 1 --Name 'Reduce-to-1-replica' --Force
dirac-dms-add-transformation --Plugin ReduceReplicas --Number 2 --Name 'Reduce-to-2-replicas' --Force
dirac-dms-add-transformation --Plugin ReduceReplicas --Number 3 --Name 'Reduce-to-3-replicas' --Force
# remove from Tier1-Buffer
dirac-dms-add-transformation --Plugin RemoveReplicas --From Tier1-Buffer --Name 'Remove-from-Tier1-Buffer' --Force
dirac-dms-add-transformation --Plugin RemoveReplicas --From Tier1-DST --Name 'Remove-from-Tier1-DST' --Force
# destroy dataset
dirac-dms-add-transformation --Plugin=DestroyDataset --Name 'Destroy-dataset' --Force
As a reminder, to add files in these transformations:
dirac-transformation-add-files <transName> [--Term | --File <file> | --LFN <lfn>]