DataManagement¶
For an introduction about DataManagement concepts, please see the introduction
All the commands mentioned below can accept several StorageElements and LFNs as parameters. Please use –help for more details.
Basics¶
Check if a file is corrupted¶
This question normally arises when a job spits lines like:
Error in <TBufferFile::CheckByteCount>: object of class LHCb::PackedRelation read too few bytes: 2 instead of 1061133141
Error in <TBufferFile::CheckByteCount>: Byte count probably corrupted around buffer position 21698:
1061133141 for a possible maximum of -6
Error in <TBufferFile::ReadClassBuffer>: class: DataObject, attempting to access a wrong version: -25634, object skipped at offset 4044
Or something like:
R__unzipLZMA: error 9 in lzma_code
Error in <TBasket::ReadBasketBuffers>: fNbytes = 28617, fKeylen = 92, fObjlen = 103039, noutot = 0, nout=0, nin=28525, nbuf=103039
We know that there are bugs in some applications that produce files that it then can’t read, but from the pure data management point of view, we consider the file good if the checksum stored in the catalog and the actual file checksum match. To check it, we need to download the file locally, compute the checksum, and compare it with the DFC.
For example:
# Copy the file locally
# (the URL can be obtained from the failing job, or from dirac-dms-lfn-accessURL, or one can even download the file with dirac-dms-get-file)
bash-4.2$ gfal-copy root://xrootd.grid.surfsara.nl//pnfs/grid.sara.nl/data/lhcb/LHCb-Disk/lhcb/LHCb/Collision18/CHARMCHARGED.MDST/00077052/0000/00077052_00008134_1.charmcharged.mdst . Copying root://xrootd.grid.surfsara.nl//pnfs/grid.sara.nl/data/lhcb/LHCb-Disk/lhcb/LHCb/Collision18/CHARMCHARGED.MDST/00077052/0000/00077052_00008134_1.charmcharged.mdst
[...]
Copying root://xrootd.grid.surfsara.nl//pnfs/grid.sara.nl/data/lhcb/LHCb-Disk/lhcb/LHCb/Collision18/CHARMCHARGED.MDST/00077052/0000/00077052_00008134_1.charmcharged.mdst... 69s 100% [====>]Copying root://xrootd.grid.surfsara.nl//pnfs/grid.sara.nl/data/lhcb/LHCb-Disk/lhcb/LHCb/Collision18/CHARMCHARGED.MDST/00077052/0000/00077052_00008134_1.charmcharged.mdst [DONE] after 69s
# compute the checksum
bash-4.2$ xrdadler32 00077052_00008134_1.charmcharged.mdst
7f84828f 00077052_00008134_1.charmcharged.mdst
# Compare it with the DFC
bash-4.2$ dirac-dms-lfn-metadata /lhcb/LHCb/Collision18/CHARMCHARGED.MDST/00077052/0000/00077052_00008134_1.charmcharged.mdst
Successful :
/lhcb/LHCb/Collision18/CHARMCHARGED.MDST/00077052/0000/00077052_00008134_1.charmcharged.mdst :
Checksum : 7f84828f
ChecksumType : Adler32
CreationDate : 2018-08-16 02:53:48
FileID : 368851163
GID : 2749
GUID : 4606A403-E5A0-E811-ACFA-001B21B993CC
Mode : 775
ModificationDate : 2018-08-16 02:53:48
Owner : fstagni
OwnerGroup : lhcb_data
Size : 5059895291
Status : AprioriGood
UID : 19727
If the checksums don’t match, the file needs to be recovered. If it is your own user file, do as you please (remove, recreate, replicate, etc). If it is centrally managed, please contact lhcb-datamanagement
mailing list