![]() ![]() ![]() |
RAMADDA User Guide
|
We want to be able to search within the hierarchy, e.g., show me all of the radar files under the "ccs2" group. To do this we encode the hierarchical structure within the identifier of the groups. The relational model looks like:
name id parent_group_id group: Top 0 --- group: Comet 0:0 0 group: ccs1 0:0:0 0:0 entry: radar1 A 0:0:0 group: ccs2 0:0:1 0:0 entry: radar2 B 0:0:1 entry: radar3 C 0:0:1 group: ccs3 D 0:0Non-Group Entrys have a globally unique identifier generated by RAMADDA. However, Groups have a special identifier that allow for searching under sub-groups. A Group identifier is defined as:
<parent Group identifier>:<sibling number>For example, the top most group has identifier "0". Its children have identifiers:
0:0 0:1 0:2 ...The children groups of the 0:2 Group have identifiers:
0:2:0 0:2:1 0:2:2 ...By doing this we can make a search of the form:
select * from entries where parent_group_id like '0:0:1%'The problem with this approach is that the identifier of a group is based on its position within the hierarchy. When a group is moved to a new parent group we need to change its identifier. Kind of messy. If external applications have a reference to one of these groups based on its id then those references are bad.
Another problem with this approach is that the "like" search entails a full table scan.
One of the alternative approaches to encoding hierarchy in a relational model is to have an ancestor table that defines the complete hierarchy. e.g.,:
id ancestor 0 --- 1 0 2 (parent=1) 1 2 0 3 (parent=2) 2 3 1 3 0This brings scalability issues. The size of this table would be average tree depth*number of entries. So for example, we have ~3 million level3 radar files with the hierarchy:
Top IDD Radar Level3 Station Product Radar data1 Radar data2We would end up having 6*3 million entries in the ancestor table. If this was a relatively static data set then we could handle that but the data set is dynamic and we get 5 files/second coming into the repository and 5/second being removed.
One approach to dealing with the identifier issue is to have all entries/groups have a unique and permanent id along with a parent_id. To encode the hierarchy we then have another field, a hierarchy_id, which takes the form that we currently have. e.g.:
name id parent_group_id hierarchy_id group: Top A --- 0 group: Comet B A 0:0 group: ccs1 C B 0:0:0 entry: radar1 D C 0:0:0:0 group: ccs2 E B 0:0:1 entry: radar2 F E 0:0:1:0 entry: radar3 G E 0:0:1:1 group: ccs3 H B 0:0:2We still have bookkeeping costs when things are added and moved in that the hierarchy_id needs to change. We still have search costs in that we do:
select * from entries where hierarchy_id like '0:0:1%'But, each entity has a unique and permanent id.