(TEI)
Some of the reasons for each set of tweets to be collected...
๐ง๐ฉ Bangladesh
Bengali tweets identified in 2019, focusing on regional political themes
๐ฎ๐ท Iran
Reported malicious activity against a Twitter industry peer, via a influence campaign
๐ท๐บ Russia
Accounts may be identified to originate from the Internet Research Agency (IRA); Twitter CEO, Jack Dorsey, testified in 2018 about activity coming from such source. lel
๐ป๐ช Venezuela
These accounts happen to be a "foreign campaign of spammy content focused on divisive political themes".
๐จ๐ณ Peopleโs Republic of China
Network of malicious actors, with 150,000 accounts designed to boost its content, e.g. the amplifiers. They were Tweeting predominantly in Chinese languages and spreading geopolitical narratives favorable to the Communist Party of China (CCP), while continuing to push deceptive narratives about the political dynamics in Hong Kong.
๐น๐ท Turkey
These accounts employ coordinated inauthentic activity to amplify political narratives favorable to the AK Parti, and demonstrated strong support for President Erdogan.
๐ช๐ธ Spain
These accounts were directly associated with the Catalan independence movement, specifically spreading content about the Catalan Referendum.
๐น๐ท Egypt
El Fagr network. The media group created inauthentic accounts to amplify messaging critical of Iran, Qatar and Turkey. Information we gained externally indicates it was taking direction from the Egyptian government.
๐ฆ๐ฒ Armenia
These accounts were created in order to advance narratives that were targeting Azerbaijan and were geostrategically favorable to the Armenian government
Place of Origin | Year of Release | Earliest Activity | Latest Activity | Number of accounts |
---|---|---|---|---|
IRA | 2018 | 2009-05-12 09:37:00 | 2018-05-29 21:31:00 | 3608 |
Iran | 2019 | 2009-12-08 05:44:00 | 2018-11-28 14:47:00 | 3081 |
Russia | 2019 | 2011-05-29 14:46:00 | 2018-11-05 14:31:00 | 416 |
Iran | 2019 | 2008-04-30 12:29:00 | 2019-04-21 21:42:00 | 4716 |
China | 2019 | 2008-02-05 18:26:00 | 2019-08-28 01:31:00 | 5241 |
China | 2020 | 2018-01-11 09:29:00 | 2020-04-17 07:13:00 | 23750 |
Russia | 2020 | 2009-05-19 09:34:00 | 2019-12-12 13:26:00 | 1153 |
IRA | 2020 | 2020-01-09 08:56:00 | 2020-08-21 18:37:00 | 5 |
Iran | 2020 | 2020-01-08 18:30:00 | 2020-07-01 06:30:00 | 104 |
tweetid
, userid
, user_display_name
, user_screen_name
,user_profile_url
,follower_count
, following_count
,account_creation_date
, account_language
, tweet_language
,tweet_text
, tweet_time
, tweet_client_name
, in_reply_to_tweetid
, in_reply_to_userid
, quoted_tweet_tweetid
,is_retweet
, retweet_userid
, retweet_tweetid
,latitude
, longitude
,quote_count
, reply_count
, like_count
, retweet_count
, hashtags
,urls
, user_mentions
, poll_choices
We've got a programatic solution (in Python 3
) to download the complete dataset (Please request access)
https://github.com/alorozco53/deep-trolls/blob/master/downloader.py
A code sample, which helps us to download all zip
files from the dataset's Google Cloud
BUCKET_PREFIX = 'gs://twitter-election-integrity/hashed'
class DownloadThread(Thread):
def __init__(self, path):
Thread.__init__(self)
self.path = path
def run(self):
print(f'Downloading {self.path}')
location = self.path.lstrip('gs://')
directory, _ = os.path.split(location)
if not os.path.exists(directory):
os.makedirs(directory)
command = f'gsutil cp {self.path} {location}'
query(command)
print(f'Extracting {location}')
with ZipFile(location, 'r') as zipObj:
zipObj.extractall(path=directory)
print(f'Downloaded {location}')