The paper's main motivation comes from figuring out how to predict if a social media post will become viral/popular in some sort of way.
In more concrete terms:
[PERSONAL TAKE] Even though their methods use "classical ML models", their goal is rather to present a framework or template to deal with social media cascades.
- Collection of public tweet cascades that contain Common Vulnerabilities and Exposure (CVE)
- Use follower data and timestamps to reconstruct conversations
- CVE, Same as Twitter
- Also, cryptocurrency-related dataset
- features about the current tree: initial size, depth
- features about an individual node
- parent delay relative difference in steps between a new node and its parent
- derived only from the author of the conversation's initial post
- i.e., user information
- limited to the root of the conversation
- use of fastText, a document embedding method from FAIR
- time and day of week of the root post
- not used in node placement task
Whole Tree Simulation
Cumulative node placement
There is plenty of previous work...
Node features to
- predict macroscopic characteristics;
- describe novelty, arrival patterns, textual expression, and social influence (Backstrom et al. 2013);
- DeepCAS: predict logarithmic increments of the size of Twitter cascades (Li et al. 2017) ;
- SansNet: (based on survival analysis) predict whether cascades will become viral (Subbian et al. 2017) .
Generative modelling to
- identify underlying mechanisms that reproduce traits of cascades;
- explore reaction times and lifespans of cascades by continuous-time dynamics (Wang et al. 2012);
- examine how the structural features of conversations are affected by their presentation in a platform interface (Aragón et al. 2017);
- capture different roles in cascade formation (Lumbreras 2016; Lumbreras et al. 2017).