Usually I found myself reading more than 100 tweets per day, so I thought that using all that ‘reading power’ to read a book using tweets could be a good idea.
During the ‘25 de Mayo’ holiday I made the first version of this crazy idea, and I called it Took, a mix between Twitter and Book (yes, I know it’s a bad name 😄).
In this post I will explain how I made Took using Python and Tweepy, how I hosted it using Heroku (and later on a Raspberry Pi) and if it was actually useful to read a book on Twitter using this bot.
Are you looking for the code 👀? You can check the Took GitHub repository here.
Table Of Contents
Coding a Twitter bot
In this section I will explain how Took works, and the changes it got since the first version.
At the beginning the idea was simple: Take the book I want to tweet, put all the book’s sentences into a list, verify that those sentences have less than 280 characters and tweet one sentence every 30 minutes. This simple initial logic is still the core of Took, only some minors changes have been done.
To start explaining in deep how the bot works, we need to go to the
book-preprocessing.py file. This file is in charge of the pre-processing of the book, this means that inside the file there are utilities functions that will help us with the job of dividing the book into tweetable sentences.
The main function here is
tweetify_book: First it will convert the book into a list of sentences using the function
txt_to_list_of_sentences, that uses the tokenize function of the nlkt library to do the job in a simple and readable way.
# Converts a .txt to a list of sentences to tweet def txt_to_list_of_sentences(txt_name): with open(txt_name, 'r') as file: data = file.read().replace('\n', ' ') return nltk.tokenize.sent_tokenize(data)
txt_to_list_of_sentences generates that list of sentences the
tweetify_book function moves on and verify if the current sentence is available to tweet, this means that it verify if the current sentence is 280 characters or fewer. If the length is more than 280 characters, the function
make_sentence_available_for_tweet will be called and it will re-tokenize that sentence to make it available to tweet.
# Takes a sentence and replaces , for . to tokenize it and be able to tweet it. def make_sentence_available_for_tweet(sentence): sentence = sentence.replace(",", ".") return nltk.tokenize.sent_tokenize(sentence)
Let’s see an example:
Adell was just drunk enough to try, just sober enough to be able to phrase the necessary symbols and operations into a question which, in words, might have corresponded to this: Will mankind one day without the net expenditure of energy be able to restore the sun to its full youthfulness even after it had died of old age? Or maybe it could be put more simply like this: How can the net amount of entropy of the universe be massively decreased? Multivac fell dead and silent.
txt_to_list_of_sentences will do nothing with this text because there are not ‘.’ involved to tokenize, so it will put the full length text of 476 characters into the list of sentences. When the verification time comes, the function
make_sentence_available_for_tweet will be called and it will convert the previous text into this:
[“Adell was just drunk enough to try”, ”just sober enough to be able to phrase the necessary symbols and operations into a question which”, ”in words”, ”might have corresponded to this: Will mankind one day without the net expenditure of energy be able to restore the sun to its full youthfulness even after it had died of old age?”, ”Or maybe it could be put more simply like this: How can the net amount of entropy of the universe be massively decreased?”, ”Multivac fell dead and silent.“ ]
Now the resulting sentences will be ready to tweet, since all of them have 280 characters or fewer. The magic here is in this line inside the function
sentence = sentence.replace(",", ".")
That converts all ‘,’ in ‘.’, letting us tokenize that sentence again using nlkt.
In Python tokenization basically refers to splitting up a larger body of text into smaller lines, words or even creating words for a non-English language. - Source: Tutorials Point
Covering how nlkt works it is out of the scope of this article, if you want to read more you can go to nltk.org.
After tokenizing all the sentences of the book, the function
tweetify_book will create a .txt file where each line is a tweet ready to be sent, you can see an example here.
Now let’s focus on the
main.py file, where all the magic happens.
If you take a look at the file, you will see that the first lines are just boring constants used to save the Twitter API keys. I will not be covering how you can use Twitter API, so let’s skip this part of the code and go straight to the end of the file, specifically where we open our already tokenized book.
Inside the while loop at the end of
main.py is where everything happens, let’s take a look:
# Tweets the book, one line every TIME_DELAY seconds with open('tweetify-books/tweetify-' + BOOK_TO_TWEET, 'r') as file: tweets = file.readlines() while current_index <= len(tweets): original_tweet_id = get_last_tweet_id() client.create_tweet(text = tweets[current_index], in_reply_to_tweet_id = original_tweet_id) print(tweets[current_index]) current_index= current_index + 1 update_index(current_index) time.sleep(TIME_DELAY)
Basically the loop will run while we have tweets left to send. Inside the loop the program will get the last tweet id using the function
get_last_tweet_id and save it in a variable called
original_tweet_id, why? Because we need to connect the tweets we are sending to make a thread on Twitter where the book will be available to read.
# Returns last tweet id, to create a thread def get_last_tweet_id(): # Get the last tweet (the one created upside) tweets = client.get_users_tweets(id=USER_ID, tweet_fields=['context_annotations','created_at','geo'], user_auth=True) return tweets.data.id
After that the current tweet will be sent using tweepy, and the variable
current_index will be increment in one. Let’s make a little interruption here and talk about why the program should save the current index.
The function that is called next,
update_index, is in charge of saving the current index of the most recent tweet into a local txt file. What is this for? We need to save the position of the last send tweet because in case the program stop running we need to resume the bot from the last sentence it tweet, and not from the beginning.
After tweeting the bot waits 30 minutes before repeating the process again. In fact, that is the logic behind Took: a simple and fun Twitter bot.
Remember that if you want to see the complete code you can check the Took GitHub repository here.
Hosting a Twitter bot
In this section I will be briefly discussing how I hosted Took, and the different options I have tried.
When I released the first version of Took I was using Heroku. At the time I thought it will be great for that job and I choose it because I used to use it in older projects.
Heroku was the home of Took for some days, but after some time I decided to use my own Raspberry Pi to do the job.
Hosting the bot using a Raspberry was a pretty straightforward job. I just clone the repository into the device, installed the necessary dependencies and voilà! It was hosted perfectly and running 24/7.
The Raspberry Pi and Heroku are just two options inside the vast sea of hosting services you have to host a bot. Feel free to do your own research and find the best option for your work.
Final Thoughts about Took
Let’s talk about Took utility and what I have learned developing it.
Took’s development begin on the 25th of May 2022, and some days later, on the 27th, i tweet about Took’s existence in my main account. That tweet caught some attention, and made Took’s account reach 80 followers that day.
After some testing Took started tweeting the first book on June 15th, and finished it on August 6th. It is important to say that the bot was inactive for a lot of days due to testing and improving stuff, and that is why there is a time gap in some tweets.
“The last question” by Isaac Asimov was the book that Took tweeted. The book have a length of 398 tweets. Please note that Took have educational purposes only, and I never intended to violate any copyright. For future books I have been thinking about using books from Project Gutenberg exclusively.
Did Took met his objective?
Yes. The book was completely tweeted in an only thread.
Is a Twitter thread a good way to read a full length book?
Let’s be honest, no. It is a bit difficult to only read one sentence at a time, and also bookmarking what was the last sentence you read is something difficult to do (you can use Twitter likes for this purpose, but it is not so practical). Also ~10 users unfollowed the bot because it was annoying to see a new tweet from Took on their timeline every 30 minutes.
It was a good project to learn how to use the Twitter API and make something fun?
Absolutely yes, and a lot of people from Reddit and Twitter loved the idea. I encourage you to try to do something similar!
Is Took going to keep tweeting other books?
For the moment Took will be paused, feel free to fork the code and start your own bot if you want to see something similar running!
That’s it! Feel free to reach me out via Twitter DMs or email me at → tadeodonegana[at]gmail[dot]com