================= Data distribution ================= Policy ====== ******* Archive ******* The default option is at `Operations//TransformationPlugins/ArchiveSEs`. it can be overwritten in each plugin. The choice is done randomly. ************* DST broadcast ************* The broadcast done by LHCbDSTBroadcast plugin is done according to the free space RAW files processing and distribution ===================================== The RAW files all have a copy at CERN, and are then distributed across the Tier1. The processing is shared between CERN and the Tier1. The selection of the site for copying the data and the site where the data will be processed (so called *RunDestination*) is done by the *RAWReplication* plugin. To do so, it uses shares that are defined in `Operations//Shares` ********************************************** Selection of a Tier1 for the data distribution ********************************************** The quota are defined in `Operations//Shares/RAW`. Since CERN has a copy of every file, it does not appear in the quota. In practice, the absolute values are meaningless, what matters is their relative values. The total is normalized to a 100 in the code. When choosing where a run will be copied, we look at the current status of the distribution, based on the run duration. The site which is the furthest from its objectives is selected. ******************************************** Selection of a Tier1 for the data processing ******************************************** Once a Tier1 has been selected to copy the RAW file, one needs to select a site where the data will be processed: either CERN or the Tier1 where the data is: the *RunDestination*. Note that the destination is chosen per Run, and will stay as is: all the production will process the run at the same location. This is done using `Operations//Shares/CPUforRAW`. There, the values are independent: they should be between 0 and 1, and represents the fraction of data it will process compared to CERN. So if the value is 0.8, it means 80% of the data copied to that site will be processed at that site, and the 20 other percent at CERN. This share is used by the processing plugin `DataProcessing`. The equivalent exists when reprocessing (plugin `DataReprocessing`): `Operations//Shares/CPUforReprocessing` ****************************** Change of values in the shares ****************************** Note: if a change is to be made after a transformation has already distributed a lot of files, it is better to start a new transformation. The principle goes as follow, but is obviously better done with an Excel sheet. From CRIC (http://wlcg-cric.cern.ch/core/pledge/list/), we take for each T1 the CPUPledge (in MHS06) and the TapePledge (PB). We deduce easily the CPUPledgePercent and TapePledgePercent. From the StorageUsageSummary, we get the CurrentTapeUsage (e.g. dirac-dms-storage-usage-summary --LCG --Site LCG.CERN.cern ) We then have:: AdditionalTape = TapePledge - CurrentTape From which we deduce AdditionalTapePercent. We then compute the ratio:: CPU / NewTape = CPUPledgePercent / AdditionalTapePercent It represents the increase of CPU pledge vs the increase of Tape with respect to the total. We then chose a certain percentage of data which is going to be processed at CERN. Say 20%. We then get:: CPUShare = CPUPledgePercent*(1-0.2) The next step is to assign a CPUFraction (in [0:1]) by hand following this guideline: the lower the CPU/Tape ratio, the lower the fraction processed "locally". The final step is to compute:: RAWShare = CPUShare/CPUFraction It represents the percentage of data to be copied to the given T1. Obviously, since we have an extra constraint, we have to give a degree of freedom. We normally give it to RAL with the following:: RALRAWShare = 100% - Sum(OtherShares) RALCPUFraction = RALCpuShare / RALRAWShare CPUFraction corresponds to `Operations//Shares/CPUforRAW` RAWShare corresponds to `Operations//Shares/RAW` MonteCarlo distribution ======================= The distribution of MC relies on the `LHCbMCDstBroadcast` plugin. In order to know what to replicate, we use a wildcard in the bookkeeping query, and for each of the individual path, we start a transformation. To be sure not to start several time the same, we use the `--Unique` option. The difficulty is to know for which year and which sim version to start. Gloria or Vladimir can tell you... In order to list the BK paths that are going to be replicated: :: for year in 2011 2012 2015 2016; do for sim in Sim09b Sim09c; do dirac-dms-add-transformation --List --BK=/MC/$year//$sim/...Reco...; done; done List of processing passes for BK path /MC/2011//Sim09b/...Reco... /Sim09b/Reco14c /Sim09b/Reco14c/Stripping21r1NoPrescalingFlagged /Sim09b/Trig0x40760037/Reco14c /Sim09b/Trig0x40760037/Reco14c/Stripping20r1Filtered /Sim09b/Trig0x40760037/Reco14c/Stripping20r1NoPrescalingFlagged /Sim09b/Trig0x40760037/Reco14c/Stripping21r1Filtered /Sim09b/Trig0x40760037/Reco14c/Stripping21r1NoPrescalingFlagged /Sim09b/Trig0x40760037/Reco14c/Stripping21r1p1Filtered /Sim09b/Trig0x40760037/Reco14c/Stripping21r1p1NoPrescalingFlagged In order to actually start these replications: :: for year in 2011 2012 2015 2016; do for sim in Sim09b Sim09c; do dirac-dms-add-transformation --Plugin LHCbMCDSTBroadcastRandom --BK=/MC/$year//$sim/...Reco... --Unique --Start; done; done Standing Transformations ======================== it is useful to have some transformations always at hand where you can just add a few files. Here are a few:: # Replicate files to freezer dirac-dms-add-transformation --Plugin ReplicateDataset --Destination CERN-FREEZER-EOS --Name 'Replicate-to-Freezer' --Force # Replicate to local buffer dirac-dms-add-transformation --Plugin=ReplicateToLocalSE --Dest=Tier1-Buffer --Name 'Replicate-to-local-Buffer' --Force # Replicate to RAW dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=CERN-RAW --Name 'Replicate-to-CERN-RAW' --Force dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=GRIDKA-RAW --Name 'Replicate-to-GRIDKA-RAW' --Force dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=RAL-RAW --Name 'Replicate-to-RAL-RAW' --Force dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=IN2P3-RAW --Name 'Replicate-to-IN2P3-RAW' --Force dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=CNAF-RAW --Name 'Replicate-to-CNAF-RAW' --Force dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=RRCKI-RAW --Name 'Replicate-to-RRCKI-RAW' --Force dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=PIC-RAW --Name 'Replicate-to-PIC-RAW' --Force # Replicate to DST dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=CERN-DST-EOS --Name 'Replicate-to-CERN-DST-EOS' --Force dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=GRIDKA-DST --Name 'Replicate-to-GRIDKA-DST' --Force dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=RAL-DST --Name 'Replicate-to-RAL-DST' --Force dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=IN2P3-DST --Name 'Replicate-to-IN2P3-DST' --Force dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=CNAF-DST --Name 'Replicate-to-CNAF-DST' --Force dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=RRCKI-DST --Name 'Replicate-to-RRCKI-DST' --Force dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=PIC-DST --Name 'Replicate-to-PIC-DST' --Force # Replicate to MC-DST dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=CERN_MC-DST-EOS --Name 'Replicate-to-CERN_MC-DST-EOS' --Force dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=GRIDKA_MC-DST --Name 'Replicate-to-GRIDKA_MC-DST' --Force dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=RAL_MC-DST --Name 'Replicate-to-RAL_MC-DST' --Force dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=IN2P3_MC-DST --Name 'Replicate-to-IN2P3_MC-DST' --Force dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=CNAF_MC-DST --Name 'Replicate-to-CNAF_MC-DST' --Force dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=RRCKI_MC-DST --Name 'Replicate-to-RRCKI_MC-DST' --Force dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=PIC_MC-DST --Name 'Replicate-to-PIC_MC-DST' --Force # To reduce number of replicas, based on data popularity dirac-dms-add-transformation --Plugin ReduceReplicas --Number 1 --Name 'Reduce-to-1-replica' --Force dirac-dms-add-transformation --Plugin ReduceReplicas --Number 2 --Name 'Reduce-to-2-replicas' --Force dirac-dms-add-transformation --Plugin ReduceReplicas --Number 3 --Name 'Reduce-to-3-replicas' --Force # remove from Tier1-Buffer dirac-dms-add-transformation --Plugin RemoveReplicas --From Tier1-Buffer --Name 'Remove-from-Tier1-Buffer' --Force dirac-dms-add-transformation --Plugin RemoveReplicas --From Tier1-DST --Name 'Remove-from-Tier1-DST' --Force # destroy dataset dirac-dms-add-transformation --Plugin=DestroyDataset --Name 'Destroy-dataset' --Force As a reminder, to add files in these transformations:: dirac-transformation-add-files [--Term | --File | --LFN ]