Scene graph generation for human-object interaction detection seeks to convert visual data into a structured graph in which nodes represent humans and objects, and edges denote the nature of their ...