Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for custom branches that contain std vectors of custom structs? #197

Open
oschulz opened this issue Dec 1, 2022 · 13 comments
Open
Labels
enhancement New feature or request question Further information is requested

Comments

@oschulz
Copy link
Member

oschulz commented Dec 1, 2022

I have a root file with a custom-type branch whose elements contain structs that contain vectors of custom structs, e.g.

struct Foo
{
  long a;
  std::vector<short> b;
};

struct Bar
{
  long c;
  std::vector<Foo> d;
};

The file uses standard ROOT autogenerated streamers. I'm trying to read it using

struct Foo
    a::Clong
    b::Vector{Cshort}
end

struct Bar
    d::Clong
    e::Vector{Foo}
end

f = ROOTFile("myfile.root", customstructs = Dict("Foo" => Foo, "Bar" => Bar))
tree = LazyTree(f, "TreeOnFire", ["bar_branch"]);
tree[1].bar_branch; # fails

but I get

julia> tree[1].bar_branch;
ERROR: MethodError: no method matching -(::Nothing, ::Int64)
[...]
Stacktrace:
 [1] _localindex_newbasket!(ba::LazyBranch{Plane, UnROOT.Nojagg, Vector{Plane}}, idx::Int64, tid::Int64)
[...]

Should this work or can't we handle custom structs like that automatically yet?

@tamasgal
Copy link
Member

tamasgal commented Dec 1, 2022

This can be a bit tricky. We don't have much (read: any) automatisation yet for custom stuff. There are different ways of doing it, maybe you check out how I do it for some of the KM3NeT datastructures here (we included that into UnROOT and its test suite for documentation purposes): https://github.com/JuliaHEP/UnROOT.jl/blob/master/test/runtests.jl#L439

The parsing action is defined here: https://github.com/JuliaHEP/UnROOT.jl/blob/master/src/custom.jl#L145

As you can see, it might require some manual bit-hopping. If you can provide a sample data, I can help you out.

@oschulz
Copy link
Member Author

oschulz commented Dec 1, 2022

Thanks @tamasgal, much appreciated! Adding a bit of bit-mangling code shouldn't be a problem. So I basically implement readtype and interped_data for the custom types, right? How do I read/iterate over std::vector in those?

@tamasgal
Copy link
Member

tamasgal commented Dec 1, 2022

For the std::vector, you need to skip the magical 10 bytes at the beginning and then use the UnROOT.readtype(io, Cshort) function. It's similar to read() but changes the byte order (ROOT is big endian).
It might need some trial and error, let me know if you need further help, but I think it should be fairly straight forward. ;)

@tamasgal tamasgal added enhancement New feature or request question Further information is requested labels Dec 1, 2022
@Moelf
Copy link
Member

Moelf commented Dec 1, 2022

the documentation is between https://juliahep.github.io/UnROOT.jl/dev/advanced/custom_branch/ and the src/custom.jl

basically, you want to implement a function

function interped_data(rawdata, rawoffsets, ::Type{Vector{LVF64}}, ::Type{Offsetjagg})

but with your own type instead of LVF64

@soudk
Copy link

soudk commented May 25, 2023

I am having some trouble figuring this out. Could someone help?

I basically have a std::vector<std::vector<int>> in a root tree I'm trying to read, which, I didn't think would be too bad since the TLorentzVector is a a vector of 4-vectors too... In my case, the length of the vectors in each event is different.

I tried doing the following:

customstruct = Dict("VecVecInt" => Vector{Vector{Int32}})

const VecVecInt = customstruct
function interped_data(rawdata, rawoffsets, ::Type{Vector{Vector{Int32}}}, ::Type{Offsetjagg})
    _size = 64 # needs to account for 32 bytes header
    dp = 0 # book keeping for copy_to!
    lr = length(rawoffsets)
    offset = Vector{Int32}(undef, lr)
    offset[1] = 0
    @views @inbounds for i in 1:lr-1
        start = rawoffsets[i]+10+1
        stop = rawoffsets[i+1]
        l = stop-start+1
        if l > 0
            unsafe_copyto!(rawdata, dp+1, rawdata, start, l)
            dp += l
            offset[i+1] = offset[i] + l
        else
            offset[i+1] = offset[i]
        end
    end
    resize!(rawdata, dp)
    real_data = interped_data(rawdata, offset, VecVecInt, Nojagg)
    offset .÷= _size
    offset .+= 1
    VectorOfVectors(real_data, offset)
end

The error I get when running:

data, offsets = UnROOT.array(f, "Tree/Event/PMTBinnedWaveforms", raw=true), where PMTBinnedWaveforms is the std::vector<std::vector<int>> I am trying to read.

is

MethodError: no method matching ROOTFile(::String, ::Dict{String, DataType})

Closest candidates are:
  ROOTFile(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any)
   @ UnROOT ~/.julia/packages/UnROOT/mBdWz/src/root.jl:13
  ROOTFile(::Function, ::Any...; pv...)
   @ UnROOT ~/.julia/packages/UnROOT/mBdWz/src/root.jl:25
  ROOTFile(::String, ::Int32, ::Union{UnROOT.FileHeader32, UnROOT.FileHeader64}, ::Union{UnROOT.HTTPStream, UnROOT.MmapStream, UnROOT.XRDStream}, ::Union{UnROOT.TKey32, UnROOT.TKey64}, ::UnROOT.Streamers, ::UnROOT.ROOTDirectory, ::Dict{String, Type})
   @ UnROOT ~/.julia/packages/UnROOT/mBdWz/src/root.jl:13
  ...

Stacktrace:
 [1] top-level scope
   @ In[6]:2

I am really stuck. Could someone help?

@tamasgal
Copy link
Member

Ahm, do you have an example file? That should work "out-of-the-box" 🙈

@soudk
Copy link

soudk commented May 25, 2023

Sure, actually here is one: https://drive.google.com/drive/folders/1qLURkYheLkdwoEj_tyGLG7JsV6wShSGt?usp=sharing

I'm trying to read PMTBinnedWaveforms and PMTWaveforms under ODTree.

@Moelf
Copy link
Member

Moelf commented May 25, 2023

julia> ROOTFile("/tmp/VetoPMTAnalysis_000.root")["ODTree"]
ODTree (TTree)
└─ "ODEvent"


julia> ROOTFile("/tmp/VetoPMTAnalysis_000.root")["ODTree"]["ODEvent"]
ODEvent
├─ TObject
│  ├─ fUniqueID
│  └─ fBits
├─ eventNumber
├─ muImpactParameter
├─ LXeImpactParameter
├─ muTrackLength
├─ muEnergy
├─ totalHits
├─ totalHitsPreQE
├─ initCherenkovOP
├─ PMTIDVec
├─ PMTWaveforms
├─ PMTBinnedWaveforms
└─ PMTTriggerVec

so your TTree contains custom struct, in this case it's tricky

@tamasgal
Copy link
Member

tamasgal commented May 25, 2023

It's reading

  fClassName: String "ODPMTDS"
  fParentName: String "ODPMTDS"

and you can check the streamer for that class with UnROOT.streamerfor(f, "ODPMTDS") (see below the output).

The problem is that the branch splitting is limited in your case (default is 99, which means that you basically have a ROOT branch with a corresponding path for each field), so that you need a parser which is able to parse the whole class instance. This means that you are not able to read e.g. only a single field PMTBinnedWaveforms of the ODPMTDS, you need to deserialise everything. 😞

julia> UnROOT.streamerfor(f, "ODPMTDS")
UnROOT.StreamerInfo(UnROOT.TStreamerInfo{UnROOT.TObjArray}("ODPMTDS", "", 0x14fb5c22, 1, UnROOT.TObjArray("", 0, Any[UnROOT.TStreamerBase
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "TObject"
 fTitle: String "Basic ROOT object"
 fType: Int32 66
 fSize: Int32 0
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, -1877229523, 0, 0, 0]
 fTypeName: String "BASE"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
 fBaseVersion: Int32 1
, UnROOT.TStreamerBasicType
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "eventNumber"
 fTitle: String ""
 fType: Int32 3
 fSize: Int64 4
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "int"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "muImpactParameter"
 fTitle: String ""
 fType: Int32 5
 fSize: Int64 4
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "float"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "LXeImpactParameter"
 fTitle: String ""
 fType: Int32 5
 fSize: Int64 4
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "float"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "muTrackLength"
 fTitle: String ""
 fType: Int32 5
 fSize: Int64 4
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "float"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "muEnergy"
 fTitle: String ""
 fType: Int32 5
 fSize: Int64 4
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "float"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "totalHits"
 fTitle: String ""
 fType: Int32 3
 fSize: Int64 4
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "int"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "totalHitsPreQE"
 fTitle: String ""
 fType: Int32 3
 fSize: Int64 4
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "int"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "initCherenkovOP"
 fTitle: String ""
 fType: Int32 3
 fSize: Int64 4
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "int"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
, UnROOT.TStreamerSTL
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "PMTIDVec"
 fTitle: String ""
 fType: Int32 500
 fSize: Int32 24
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "vector<int>"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
 fSTLtype: Int32 1
 fCtype: Int32 3
, UnROOT.TStreamerSTL
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "PMTWaveforms"
 fTitle: String "All hits on PMTs"
 fType: Int32 500
 fSize: Int32 24
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "vector<vector<float> >"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
 fSTLtype: Int32 1
 fCtype: Int32 61
, UnROOT.TStreamerSTL
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "PMTBinnedWaveforms"
 fTitle: String ""
 fType: Int32 500
 fSize: Int32 24
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "vector<vector<int> >"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
 fSTLtype: Int32 1
 fCtype: Int32 61
, UnROOT.TStreamerSTL
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "PMTTriggerVec"
 fTitle: String ""
 fType: Int32 500
 fSize: Int32 24
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "vector<vector<int> >"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
 fSTLtype: Int32 1
 fCtype: Int32 61
])), Set(Any["TObject"]))

@Moelf
Copy link
Member

Moelf commented May 25, 2023

Python uproot can parse it

In [16]: up
Out[16]: <module 'uproot' from '/home/akako/.conda/envs/hep/lib/python3.11/site-packages/uproot/__init__.py'>

In [17]: r = up.open("/tmp/VetoPMTAnalysis_000.root")["ODTree"].arrays()

In [18]: r.PMTBinnedWaveforms[0]
Out[18]: <Array [[0, 0, 0, 0, 0, 0, ..., 0, 0, 0, 0, 0], ...] type='472 * var * int32'>

but I don't think we can do much here at the moment, parsing arbitrary C++ class without maximal splitting is too hard for now.


if you convert the TTree to RNTuple, we should be able to read that easily

@tamasgal
Copy link
Member

Automatic parsing of custom stuff is definitely on the big todo list, but I am totally overloaded 😞 still hoping that a few more contributors jump in soon 🙂

@soudk
Copy link

soudk commented May 25, 2023

Yeah, I was using Python UpROOT before but stumbled on, and really like, UnROOT hence the potential swap over.

Thanks for the help! I'll try converting to an RNTuple and see, I don't really need the other TTree right now anyway.

@tamasgal
Copy link
Member

Or set the branch splitting to 99 ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants