The certification process¶
Certifying a release is a process. There are a number of steps to make to reach the point in which we can finally say that a release is at production level. Within LHCbDirac, we are trying to streamline and automatize this process as much as possible. Even with that, some tests still require manual intervention. We can split the process in a series of incremental tests.
Within the following sections we describe, step by step, all the actions needed.
The whole certification process varies from release to release. The list of things to do is maintained in trello boards.
Unit test¶
When a new release candidate is created from the devel branch, we first run pylint on the whole codebase, and all the unit tests. Jenkins automizes this for us.
Integration and Regression tests¶
Run by Jenkins.
System tests¶
Even if it should not be considered strictly as a test, running all the agents and service within certification is an action to take. Agents and services spits errors and exceptions. While the second are obviously bugs, the first are not to be considered bugs until an expert look. Nonetheless, we have created a tool to easily identify all new exceptions and errors:
codeLocation=https://gitlab.cern.ch/lhcb-dirac/LHCbDIRAC/raw/devel/tests/System/LogsParser/
mkdir /tmp/logTest
cd /tmp/logTest
wget -r -np -nH --cut-dirs=7 $codeLocation
/bin/bash logParser.sh
For testing that the RMS works, there is an ad-hoc test:
wget http://github.com/DIRACGrid/DIRAC/blob/integration/DataManagementSystem/test/IntegrationFCT.py
python IntegrationFCT.py lhcb_user CERN-USER RAL-USER CNAF-USER
python IntegrationFCT.py lhcb_prod CERN-FAILOVER RAL-FAILOVER CNAF-FALIOVER
Those commands will create and put to the Request Management System two new requests:
- for lhcb_user group, which should be banned from using the FTS system
- for lhcb_prod or lhcb_prmgr group, which this should be executed using FTS
You could monitor their execution using Request monitor web page or by using CLI comamnd:
dirac-rms-show-request test<userName>-<userGroup>
The execution itself will take a while, but at the end both requests statuses should be set to ‘Done’.
Another test, again for the RMS, combined with FTS, is to simply use the following standard DIRAC scripts:
dirac-dms-create-replication-request CNAF_MC-DST /lhcb/certification/test/ALLSTREAMS.DST/00000751/0000/00000751_00000014_1.allstreams.dst
Which will actually schedule the replication of such file using FTS. This will print an ID that can be used for the script
dirac-rms-show-request ID
That should show how the request goes (quickly) in status “Scheduled”, and then “Done”.
The following script, instead, will remove the copy just created.
dirac-dms-create-removal-request CNAF_MC-DST /lhcb/certification/test/ALLSTREAMS.DST/00000751/0000/00000751_00000014_1.allstreams.dst
Again, monitoring is available as above.
For testing the replications and removals, use the following:
dirac-dms-add-replication --BKQuery=/validation/MC11a/Beam3500GeV-2011-MagDown-Nu2-EmNoCuts/Sim05/Trig0x40760037Flagged/Reco12a/Stripping17Flagged/12463412/ALLSTREAMS.DST --Plugin=ReplicateDataset --Test
That will just print out how many files can be replicated. If there is at least one file (for this particular query there should be 35), then you can start it with:
dirac-dms-add-replication --BKQuery=/validation/MC11a/Beam3500GeV-2011-MagDown-Nu2-EmNoCuts/Sim05/Trig0x40760037Flagged/Reco12a/Stripping17Flagged/12463412/ALLSTREAMS.DST --Plugin=ReplicateDataset --NumberOfReplicas=2 --SecondarySEs Tier1-DST --Start
You can monitor the advancement using:
dirac-dms-replica-stats --BKQuery=/validation/MC11a/Beam3500GeV-2011-MagDown-Nu2-EmNoCuts/Sim05/Trig0x40760037Flagged/Reco12a/Stripping17Flagged/12463412/ALLSTREAMS.DST
Which should tell you the replica statistics, something like:
[fstagni@lxplus0032 ~]$ dirac-dms-replica-stats --BKQuery=/validation/MC11a/Beam3500GeV-2011-MagDown-Nu2-EmNoCuts/Sim05/Trig0x40760037Flagged/Reco12a/Stripping17Flagged/12463412/ALLSTREAMS.DST
Executing BK query: {'Visible': 'Yes', 'ConfigName': 'validation', 'ConditionDescription': 'Beam3500GeV-2011-MagDown-Nu2-EmNoCuts', 'EventType': '12463412', 'FileType': 'ALLSTREAMS.DST', 'ConfigVersion': 'MC11a', 'ProcessingPass': '/Sim05/Trig0x40760037Flagged/Reco12a/Stripping17Flagged', 'SimulationConditions': 'Beam3500GeV-2011-MagDown-Nu2-EmNoCuts'}
34 files (0.0 TB) in directories:
/lhcb/validation/MC11a/ALLSTREAMS.DST/00000654/0000 34 files
34 files found with replicas
Replica statistics:
0 archives: 0 files
1 archives: 25 files
2 archives: 9 files
0 replicas: 0 files
1 replicas: 0 files
2 replicas: 0 files
3 replicas: 33 files
4 replicas: 0 files
5 replicas: 1 files
SE statistics:
CERN-ARCHIVE: 15 files
CNAF-ARCHIVE: 5 files
GRIDKA-ARCHIVE: 11 files
IN2P3-ARCHIVE: 1 files
RAL-ARCHIVE: 8 files
SARA-ARCHIVE: 3 files
CERN_MC_M-DST: 34 files
CNAF_MC-DST: 4 files
CNAF_MC_M-DST: 8 files
GRIDKA_MC-DST: 1 files
GRIDKA_MC_M-DST: 3 files
IN2P3_MC-DST: 9 files
IN2P3_MC_M-DST: 6 files
PIC_MC-DST: 5 files
PIC_MC_M-DST: 4 files
RAL_MC-DST: 20 files
RAL_MC_M-DST: 6 files
SARA_MC-DST: 3 files
SARA_MC_M-DST: 1 files
Sites statistics:
LCG.CERN.ch: 34 files
LCG.CNAF.it: 12 files
LCG.GRIDKA.de: 4 files
LCG.IN2P3.fr: 15 files
LCG.PIC.es: 9 files
LCG.RAL.uk: 26 files
LCG.SARA.nl: 4 files
Later, when you see that at least 2 replicas exist, you can issue
dirac-dms-add-replication --BKQuery=/validation/MC11a/Beam3500GeV-2011-MagDown-Nu2-EmNoCuts/Sim05/Trig0x40760037Flagged/Reco12a/Stripping17Flagged/12463412/ALLSTREAMS.DST --Plugin=DeleteReplicas --NumberOfReplicas=1 --Start
Acceptance test steps¶
Installation of LHCbDirac¶
Login to a machine where LHCbDirac is already installed. Set the LHCbDirac environment, get a proxy with admin rights and launch the sysadmin CLI
lb-run LHCbDirac/prod bash
lhcb-proxy-init -g diracAdmin
dirac-admin-sysadmin-cli
Update the LHCbDirac version and restart all the services
set host volhcbXX.cern.ch
update LHCb-vArBpC
restart *
Change the version of the pilot in the CS. Go to the web portal, login with your certificate and the role diracAdmin. Click on Systems, Configuration and Manage Remote configuration.
The version is in the section /Operations/lhcb/LHCb-Certification/Versions/PilotVersion. Clicks on the PilotVersion and on change option value. Once you have changed the version number, click on submit. and do not forget to commit the change.
So you click on the left column on Commit Configuration
Now you should restart the task queue director
cd /opt/dirac/runit
runsvctrl d WorkloadManagement/TaskQueueDirector
runsvctrl u WorkloadManagement/TaskQueueDirector
Production test activity¶
Open your browser and connect to the certification instance of the LHCbDirac web portal (http://lhcb-cert-dirac.cern.ch) select the setup LHCb-Certification and load your certificate in the portal. Check that that your role is lhcb_user. Go to the tab Production and click on the Requests choice
Click on the production which is defined label “template for certification” (nb = 28) and in the menu which appears select Duplicate
You are ask if you want to Clear the processing pass in the copy. Select No. This will keep all the steps which are pre-defined.
The new request is created and you get a number that will appear in the web page.
Click on the new request that you just created the step below and select the edit option
Then modify all the fields which needs a new value. Once you have finished, submit your request to the production team.
You have just to approve it.
Now you should change your role to become lhcb_tech and lhcb_ppg to validate the request. You click on the new request and in the menu you choose the option sign
You can sign or reject the request.
Once the request has been accepted by lhcb_ppg and lhcb_tech, the status become accepted. Choose now the role lhcb_pmgr and click on the request. Then choose the option edit
You give the correct Event Type and number of Events. Then you click on Generate At this stage you are asked to choose which template should be used. In our case we will choose “MC_Simulation_run.py” and click on the next button.
You get now the list of value that you could change before submitting the production. For the certification purpose you should change the value for “MC configuratioon name” to be certification, the “configuration version” should be test. Verify which plugin you want to use, the number of event that you want to process, the cputimelimit,… Once you have finished, click on the generate button.
After the generation of the production you will get in a new window the production ID and the number of jobs generated. If you want you can see and save the script which will generate this production by clicking on the script preview button.
This is the window of the python script which could be used to generate again the production. To exit thi swindow click on cancel
If you click on the request and you choose production monitor you will be re-direct to the production monitor.
Production monitor with the fresh generated productions.
dirac-bookkeeping-production-informations 830 -o /DIRAC/Setup=LHCb-Certification
lxplus448] x86_64-slc5-gcc46-opt /afs/cern.ch/user/j/joel> dirac-bookkeeping-production-informations 830 -o /DIRAC/Setup=LHCb-Certification
Production Info:
Configuration Name: LHCb
Configuration Version: Collision11
Event type: 91000000
-----------------------
StepName: merging MDF
ApplicationName : mergeMDF
ApplicationVersion : None
OptionFiles : None
DDDB : None
CONDDB : None
ExtraPackages :None
-----------------------
Number of Steps 1
Total number of files: 2
LOG:1
RAW:1
Number of events
File Type Number of events Event Type EventInputStat
RAW 30988 91000000 30988
Path: /LHCb/Collision11/Beam3500GeV-VeloClosed-MagDown/Real Data/Merging
/LHCb/Collision11/Beam3500GeV-VeloClosed-MagDown/Real Data/Merging/91000000/RAW
You can then check the produced files:
nsls -l /castor/cern.ch/grid/lhcb/certification/test/ALLSTREAMS.DST/00000225/0000
dirac-dms-lfn-replicas /lhcb/certification/test/ALLSTREAMS.DST/00000225/0000/00000225_00000001_1.allstreams.dst
dirac-dms-add-replication --Production 259:268 --FileType RADIATIVE.DST --Plugin LHCbMCDSTBroadcastRandom --Request 30
dirac-dms-add-replication --Production 239 --FileType ALLSTREAMS.DST --Plugin LHCbMCDSTBroadcastRandom --Request 29
Transformation 273 created
Name: Replication-ALLSTREAMS.DST-239-Request29 , Description: LHCbMCDSTBroadcastRandom of ALLSTREAMS.DST for productions 239
BK Query: {'FileType': ['ALLSTREAMS.DST'], 'ProductionID': ['239'], 'Visibility': 'Yes'}
3 files found for that query
Plugin: LHCbMCDSTBroadcastRandom
RequestID: 29
[lxplus433] x86_64-slc5-gcc43-opt /afs/cern.ch/lhcb/software/DEV/LHCBDIRAC/LHCBDIRAC_v6r0-pre12> dirac-bookkeeping-production-informations 239Production Info::
Configuration Name: certification
Configuration Version: test
Event type: 12143001
StepName: MCMerging10
ApplicationName : LHCb
ApplicationVersion : v31r7
OptionFiles : $STDOPTS/PoolCopy.opts
DDB : head-20101206
CONDDB : sim-20101210-vc-md100
ExtraPackages :None
Number of Steps 4
Total number of files: 8
LOG:4
ALLSTREAMS.DST:4
Number of events
File Type Number of events Event Type EventInputStat
ALLSTREAMS.DST 540 12143001 540
Path: /certification/test/Beam3500GeV-VeloClosed-MagDown-Nu3/MC10Sim01-Trig0x002e002aFlagged/Reco08/Stripping12Flagged
/certification/test/Beam3500GeV-VeloClosed-MagDown-Nu3/MC10Sim01-Trig0x002e002aFlagged/Reco08/Stripping12Flagged/12143001/ALLSTREAMS.DST
dirac-bookkeeping-production-files 239 ALLSTREAMS.DST
FileName Size GUID Replica
/lhcb/certification/test/ALLSTREAMS.DST/00000239/0000/00000239_00000044_1.allstreams.dst 14515993 165DD5A9-1D40-E011-AD80-003048F1E1E0 Yes
/lhcb/certification/test/ALLSTREAMS.DST/00000239/0000/00000239_00000045_1.allstreams.dst 2971054 988731FC-1C40-E011-AFCD-90E6BA442F3B Yes
/lhcb/certification/test/ALLSTREAMS.DST/00000239/0000/00000239_00000074_1.allstreams.dst 202748580 E2BAF0A1-A340-E011-BF97-003048F1B834 Yes
/lhcb/certification/test/ALLSTREAMS.DST/00000239/0000/00000239_00000076_1.allstreams.dst 2804277 F086C525-EB43-E011-96F9-001EC9D8B181 Yes
[lxplus433] x86_64-slc5-gcc43-opt /afs/cern.ch/lhcb/software/DEV/LHCBDIRAC/LHCBDIRAC_v6r0-pre12> dirac-dms-lfn-replicas /lhcb/certification/test/ALLSTREAMS.DST/00000239/0000/00000239_00000044_1.allstreams.dst
{'Failed': {},
'Successful': {'/lhcb/certification/test/ALLSTREAMS.DST/00000239/0000/00000239_00000044_1.allstreams.dst': {'CERN_MC_M-DST': 'srm://srm-lhcb.cern.ch/castor/cern.ch/grid/lhcb/certification/test/ALLSTREAMS.DST/00000239/0000/00000239_00000044_1.allstreams.dst'}}}
[lxplus433] x86_64-slc5-gcc43-opt /afs/cern.ch/lhcb/software/DEV/LHCBDIRAC/LHCBDIRAC_v6r0-pre12> dirac-dms-lfn-replicas /lhcb/certification/test/ALLSTREAMS.DST/00000239/0000/00000239_00000045_1.allstreams.dst
{'Failed': {},
'Successful': {'/lhcb/certification/test/ALLSTREAMS.DST/00000239/0000/00000239_00000045_1.allstreams.dst': {'CNAF_MC_M-DST': 'srm://storm-fe-lhcb.cr.cnaf.infn.it/t1d1/lhcb/certification/test/ALLSTREAMS.DST/00000239/0000/00000239_00000045_1.allstreams.dst'}}}
[lxplus433] x86_64-slc5-gcc43-opt /afs/cern.ch/lhcb/software/DEV/LHCBDIRAC/LHCBDIRAC_v6r0-pre12> dirac-dms-lfn-replicas /lhcb/certification/test/ALLSTREAMS.DST/00000239/0000/00000239_00000074_1.allstreams.dst
{'Failed': {},
'Successful': {'/lhcb/certification/test/ALLSTREAMS.DST/00000239/0000/00000239_00000074_1.allstreams.dst': {'CERN_MC_M-DST': 'srm://srm-lhcb.cern.ch/castor/cern.ch/grid/lhcb/certification/test/ALLSTREAMS.DST/00000239/0000/00000239_00000074_1.allstreams.dst'}}}
[lxplus433] x86_64-slc5-gcc43-opt /afs/cern.ch/lhcb/software/DEV/LHCBDIRAC/LHCBDIRAC_v6r0-pre12> dirac-dms-lfn-replicas /lhcb/certification/test/ALLSTREAMS.DST/00000239/0000/00000239_00000076_1.allstreams.dst
{'Failed': {},
'Successful': {'/lhcb/certification/test/ALLSTREAMS.DST/00000239/0000/00000239_00000076_1.allstreams.dst': {'CNAF_MC_M-DST': 'srm://storm-fe-lhcb.cr.cnaf.infn.it/t1d1/lhcb/certification/test/ALLSTREAMS.DST/00000239/0000/00000239_00000076_1.allstreams.dst'}}}
How to enable/disable FTS channel ? To check TFS transfer, look at the log for DataManagement/FTSSubmitAgent
Specific tests¶
Every release is somewhat special, and introduce new features that should be tested. It has to be noted that developers should always participate in the testing of very specific new developments, anyway the certification manager should look into if these tests have been done.
Within Jira, there is a special board, named ready for integration. that contain tasks marked as “Resolved”, but not yet “Done”. Dragging tasks from left to right will mark them as “Done”.
So, the certification manager can decide to investigate directly, by submitting tests, if know, or ask the developer to confirm the task can be closed.