Bollenbacher, Pacheco, et al
presented by Albert Orozco Camacho
The paper's main motivation comes from figuring out how to predict if a social media post will become viral/popular in some sort of way.
In more concrete terms:
[PERSONAL TAKE] Even though their methods use "classical ML models", their goal is rather to present a framework or template to deal with social media cascades.
- Collection of public tweet cascades that contain Common Vulnerabilities and Exposure (CVE)
- Use follower data and timestamps to reconstruct conversations
- CVE, Same as Twitter
- Also, cryptocurrency-related dataset
Structural Features
- features about the current tree: initial size, depth
- features about an individual node
- parent delay relative difference in steps between a new node and its parent
User Features
- derived only from the author of the conversation's initial post
- i.e., user information
Content Features
- limited to the root of the conversation
- use of fastText, a document embedding method from FAIR
Temporal Features
- time and day of week of the root post
- not used in node placement task
Given a initial tree of $k$ nodes (possibly, $k = 1$)...
Size Prediction
Node Placement
Size Prediction
Node Placement
Whole Tree Simulation
Cumulative node placement
There is plenty of previous work...
Cascades
- revealing common properties among popular/viral conversations; (Gómez et al. 2008; Boyd et al. 2010; Rossi and Magnani 2012; Dow et al. 2013; Weng et al. 2013)
- elusisiveness of virality prediction (Salganik et al, 2006);
Node features to
- predict macroscopic characteristics;
- describe novelty, arrival patterns, textual expression, and social influence (Backstrom et al. 2013);
- DeepCAS: predict logarithmic increments of the size of Twitter cascades (Li et al. 2017) ;
- SansNet: (based on survival analysis) predict whether cascades will become viral (Subbian et al. 2017) .
Generative modelling to
- identify underlying mechanisms that reproduce traits of cascades;
- explore reaction times and lifespans of cascades by continuous-time dynamics (Wang et al. 2012);
- examine how the structural features of conversations are affected by their presentation in a platform interface (Aragón et al. 2017);
- capture different roles in cascade formation (Lumbreras 2016; Lumbreras et al. 2017).
Deep Methods on Cascades
- CAS2VEC: embeddings from sequences of event timestamps (Kefato et al. 2018);
- predict whether users in a social graph would participate in a cascade (Islam et al. 2018);
- RNNs and temporal point processes to predict sequences of events (Du et al. 2016).
Point Processes
- simulate cascades by fitting parameters to historical data (Shen et al. 2014);
- have been successfully applied to the study of Twitter cascades (Kobayashi and Lambiotte 2016);
- CAVEAT: most point processes don't work from just the initial post.
Hawkes Processes
- predict full tree structure from only the initial post (SKrohn and Weninger 2019);
- CTPM is the only model directly comparable to the one presented in this paper!