Data distribution

Policy

Archive

The default option is at Operations/<Setup>/TransformationPlugins/ArchiveSEs. it can be overwritten in each plugin. The choice is done randomly.

DST broadcast

The broadcast done by LHCbDSTBroadcast plugin is done according to the free space

RAW files processing and distribution

The RAW files all have a copy at CERN, and are then distributed across the Tier1. The processing is shared between CERN and the Tier1.

The selection of the site for copying the data and the site where the data will be processed (so called RunDestination) is done by the RAWReplication plugin. To do so, it uses shares that are defined in Operations/<Setup>/Shares

Selection of a Tier1 for the data distribution

The quota are defined in Operations/<Setup>/Shares/RAW.

Since CERN has a copy of every file, it does not appear in the quota.

In practice, the absolute values are meaningless, what matters is their relative values. The total is normalized to a 100 in the code.

When choosing where a run will be copied, we look at the current status of the distribution, based on the run duration. The site which is the furthest from its objectives is selected.

Selection of a Tier1 for the data processing

Once a Tier1 has been selected to copy the RAW file, one needs to select a site where the data will be processed: either CERN or the Tier1 where the data is: the RunDestination. Note that the destination is chosen per Run, and will stay as is: all the production will process the run at the same location.

This is done using Operations/<Setup>/Shares/CPUforRAW. There, the values are independent: they should be between 0 and 1, and represents the fraction of data it will process compared to CERN. So if the value is 0.8, it means 80% of the data copied to that site will be processed at that site, and the 20 other percent at CERN.

This share is used by the processing plugin DataProcessing. The equivalent exists when reprocessing (plugin DataReprocessing): Operations/<Setup>/Shares/CPUforReprocessing

Change of values in the shares

Note: if a change is to be made after a transformation has already distributed a lot of files, it is better to start a new transformation.

The principle goes as follow, but is obviously better done with an Excel sheet.

From CRIC (http://wlcg-cric.cern.ch/core/pledge/list/), we take for each T1 the CPUPledge (in MHS06) and the TapePledge (PB). We deduce easily the CPUPledgePercent and TapePledgePercent.

From the StorageUsageSummary, we get the CurrentTapeUsage (e.g. dirac-dms-storage-usage-summary –LCG –Site LCG.CERN.cern )

We then have:

AdditionalTape = TapePledge - CurrentTape

From which we deduce AdditionalTapePercent.

We then compute the ratio:

CPU / NewTape = CPUPledgePercent / AdditionalTapePercent

It represents the increase of CPU pledge vs the increase of Tape with respect to the total.

We then chose a certain percentage of data which is going to be processed at CERN. Say 20%. We then get:

CPUShare = CPUPledgePercent*(1-0.2)

The next step is to assign a CPUFraction (in [0:1]) by hand following this guideline: the lower the CPU/Tape ratio, the lower the fraction processed “locally”.

The final step is to compute:

RAWShare = CPUShare/CPUFraction

It represents the percentage of data to be copied to the given T1.

Obviously, since we have an extra constraint, we have to give a degree of freedom. We normally give it to RAL with the following:

RALRAWShare = 100% - Sum(OtherShares)
RALCPUFraction = RALCpuShare / RALRAWShare

CPUFraction corresponds to Operations/<Setup>/Shares/CPUforRAW

RAWShare corresponds to Operations/<Setup>/Shares/RAW

MonteCarlo distribution

The distribution of MC relies on the LHCbMCDstBroadcast plugin. In order to know what to replicate, we use a wildcard in the bookkeeping query, and for each of the individual path, we start a transformation. To be sure not to start several time the same, we use the –Unique option.

The difficulty is to know for which year and which sim version to start. Gloria or Vladimir can tell you…

In order to list the BK paths that are going to be replicated:

for year in 2011 2012 2015 2016;
do
  for sim in Sim09b Sim09c;
  do
    dirac-dms-add-transformation --List --BK=/MC/$year//$sim/...Reco...;
  done;
done


List of processing passes for BK path /MC/2011//Sim09b/...Reco...

/Sim09b/Reco14c
/Sim09b/Reco14c/Stripping21r1NoPrescalingFlagged
/Sim09b/Trig0x40760037/Reco14c
/Sim09b/Trig0x40760037/Reco14c/Stripping20r1Filtered
/Sim09b/Trig0x40760037/Reco14c/Stripping20r1NoPrescalingFlagged
/Sim09b/Trig0x40760037/Reco14c/Stripping21r1Filtered
/Sim09b/Trig0x40760037/Reco14c/Stripping21r1NoPrescalingFlagged
/Sim09b/Trig0x40760037/Reco14c/Stripping21r1p1Filtered
/Sim09b/Trig0x40760037/Reco14c/Stripping21r1p1NoPrescalingFlagged

In order to actually start these replications:

for year in 2011 2012 2015 2016;
do
  for sim in Sim09b Sim09c;
  do
    dirac-dms-add-transformation --Plugin LHCbMCDSTBroadcastRandom --BK=/MC/$year//$sim/...Reco... --Unique --Start;
  done;
done

Standing Transformations

it is useful to have some transformations always at hand where you can just add a few files. Here are a few:

# Replicate files to freezer
dirac-dms-add-transformation --Plugin ReplicateDataset --Destination CERN-FREEZER-EOS --Name 'Replicate-to-Freezer' --Force

# Replicate to local buffer
dirac-dms-add-transformation --Plugin=ReplicateToLocalSE --Dest=Tier1-Buffer --Name 'Replicate-to-local-Buffer' --Force


# Replicate to RAW
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=CERN-RAW --Name 'Replicate-to-CERN-RAW' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=GRIDKA-RAW --Name 'Replicate-to-GRIDKA-RAW' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=RAL-RAW --Name 'Replicate-to-RAL-RAW' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=IN2P3-RAW --Name 'Replicate-to-IN2P3-RAW' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=CNAF-RAW --Name 'Replicate-to-CNAF-RAW' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=RRCKI-RAW --Name 'Replicate-to-RRCKI-RAW' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=PIC-RAW --Name 'Replicate-to-PIC-RAW' --Force


# Replicate to DST
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=CERN-DST-EOS --Name 'Replicate-to-CERN-DST-EOS' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=GRIDKA-DST --Name 'Replicate-to-GRIDKA-DST' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=RAL-DST --Name 'Replicate-to-RAL-DST' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=IN2P3-DST --Name 'Replicate-to-IN2P3-DST' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=CNAF-DST --Name 'Replicate-to-CNAF-DST' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=RRCKI-DST --Name 'Replicate-to-RRCKI-DST' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=PIC-DST --Name 'Replicate-to-PIC-DST' --Force


# Replicate to MC-DST
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=CERN_MC-DST-EOS --Name 'Replicate-to-CERN_MC-DST-EOS' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=GRIDKA_MC-DST --Name 'Replicate-to-GRIDKA_MC-DST' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=RAL_MC-DST --Name 'Replicate-to-RAL_MC-DST' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=IN2P3_MC-DST --Name 'Replicate-to-IN2P3_MC-DST' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=CNAF_MC-DST --Name 'Replicate-to-CNAF_MC-DST' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=RRCKI_MC-DST --Name 'Replicate-to-RRCKI_MC-DST' --Force
dirac-dms-add-transformation --Plugin=ReplicateDataset --Dest=PIC_MC-DST --Name 'Replicate-to-PIC_MC-DST' --Force




# To reduce number of replicas, based on data popularity
dirac-dms-add-transformation --Plugin ReduceReplicas --Number 1 --Name 'Reduce-to-1-replica' --Force
dirac-dms-add-transformation --Plugin ReduceReplicas --Number 2 --Name 'Reduce-to-2-replicas' --Force
dirac-dms-add-transformation --Plugin ReduceReplicas --Number 3 --Name 'Reduce-to-3-replicas' --Force


# remove from Tier1-Buffer
dirac-dms-add-transformation --Plugin RemoveReplicas --From Tier1-Buffer --Name 'Remove-from-Tier1-Buffer' --Force
dirac-dms-add-transformation --Plugin RemoveReplicas --From Tier1-DST --Name 'Remove-from-Tier1-DST' --Force

# destroy dataset
dirac-dms-add-transformation --Plugin=DestroyDataset --Name 'Destroy-dataset' --Force

As a reminder, to add files in these transformations:

dirac-transformation-add-files <transName> [--Term | --File <file> | --LFN <lfn>]