Data 100, Fall 2023
You can run this notebook on the Jupyter Hub machines but you will need to setup an OpenAI account. Alternatively, if you are running on your own computer you can also try to run a model locally.
You can create a free account which has some initial free credits by going here:
You will the need to get an API Key. Save that api key to a local file called openai.key
:
with open("openai.key", "w") as f:
f.write("Your API Key")
Uncomment the following line.
# !pip install openai langchain
from langchain.llms import OpenAI
openai_key = open("openai.key", "r").readline()
llm = OpenAI(openai_api_key=openai_key,
model="gpt-3.5-turbo-instruct",
#temperature=0,
max_tokens=512)
llm("What is the capital of California?")
'\n\nThe capital of California is Sacramento.'
for chunk in llm.stream("Write a short song about data science and large language models."):
print(chunk, end="", flush=True)
Verse 1: Data science, it's the future we embrace Using numbers to unlock the human race From predictions to insights, we see it all But now we've got something new, standing tall Chorus: Large language models, they're here to stay With endless knowledge, they'll light the way From GPT-3 to BERT, they're the real deal In data science, they're the ultimate reveal Verse 2: With billions of parameters, they're learning fast Analyzing text, from present to the past They can generate, translate, and summarize Making our tasks easier, no surprise Chorus: Large language models, they're here to stay With endless knowledge, they'll light the way From GPT-3 to BERT, they're the real deal In data science, they're the ultimate reveal Bridge: But with great power, comes great responsibility We must use them wisely, with integrity For the future of AI, lies in our hands Let's harness their potential, and make great plans Chorus: Large language models, they're here to stay With endless knowledge, they'll light the way From GPT-3 to BERT, they're the real deal In data science, they're the ultimate reveal Outro: So let's embrace data science, and these models too Together we can do things, we never thought we'd do With endless possibilities, the future's bright Data science and large language models, shining light.
You can download and install Ollama from:
This will run models locally
# from langchain.llms import Ollama
# from langchain.callbacks.manager import CallbackManager
# from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
# vicuna = Ollama(
# model="vicuna",
# # temperature=0,
# callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])
# )
# vicuna("What is the capital of California? Answer with only one word.")
# vicuna("Write a short song about data science and large language models.")
import pandas as pd
tweets = pd.read_json("AOC_recent_tweets.txt")
list(tweets['full_text'][0:10])
['RT @RepEscobar: Our country has the moral obligation and responsibility to reunite every single family separated at the southern border.\n\nT…', 'RT @RoKhanna: What happens when we guarantee $15/hour?\n\n💰 31% of Black workers and 26% of Latinx workers get raises.\n😷 A majority of essent…', '(Source: https://t.co/3o5JEr6zpd)', 'Joe Cunningham pledged to never take corporate PAC money, and he never did. Mace said she’ll cash every check she gets. Yet another way this is a downgrade. https://t.co/DytsQXKXgU', 'What’s even more gross is that Mace takes corporate PAC money.\n\nShe’s already funded by corporations. Now she’s choosing to swindle working people on top of it.\n\nPeak scam artistry. Caps for cash 💰 https://t.co/CcVxgDF6id', 'Joe Cunningham already proving to be leagues more decent + honest than Mace seems capable of.\n\nThe House was far better off w/ Cunningham. It’s sad to see Mace diminish the representation of her community by launching a reputation of craven dishonesty right off the bat.', 'Pretty horrible.\n\nWell, it’s good to know what kind of person she is early. Also good to know that Mace is cut from the same Trump cloth of dishonesty and opportunism.\n\nSad to see a colleague intentionally hurt other women and survivors to make a buck. Thought she’d be better. https://t.co/CcVxgDF6id', 'RT @jaketapper: .@RepNancyMace fundraising off the false smear that @AOC misrepresented her experience during the insurrection. She didn’t.…', 'RT @RepMcGovern: One reason Washington can’t “come together” is because of people like her sending out emails like this.\n\nShe should apolog…', 'RT @JoeNeguse: Just to be clear, “targeting” stimulus checks means denying them to some working families who would otherwise receive them.']
Suppose I wanted to evaluate whether a tweet is attacking someone
prompt = """
Is the following text making a statement about minimum wage? You should answer either Yes or No.
{text}
Answer:
"""
questions = [prompt.format_map(dict(text=t)) for t in tweets['full_text'].head(20)]
Ask each of the LLMs to answer the questions:
open_ai_answers = llm.batch(questions)
open_ai_answers
['\nYes', '\nYes', '\nNo', '\nNo', '\nNo', '\nNo', 'Yes', '\nNo', '\nNo', '\nYes', '\nNo', '\nNo', '\nNo', 'No', 'No', '\nNo', 'No', '\nNo', '\nYes', 'No ']
# vicuna_answers = vicuna.batch(questions)
# vicuna_answers
pd.set_option('display.max_colwidth', None)
df = pd.DataFrame({"OpenAI": open_ai_answers,
# "Vicuna": vicuna_answers,
"Text": tweets['full_text'].head(20)})
df["OpenAI"] = df["OpenAI"].str.contains("Y")
# df["Vicuna"] = df["Vicuna"].str.contains("Y")
df
OpenAI | Text | |
---|---|---|
0 | True | RT @RepEscobar: Our country has the moral obligation and responsibility to reunite every single family separated at the southern border.\n\nT… |
1 | True | RT @RoKhanna: What happens when we guarantee $15/hour?\n\n💰 31% of Black workers and 26% of Latinx workers get raises.\n😷 A majority of essent… |
2 | False | (Source: https://t.co/3o5JEr6zpd) |
3 | False | Joe Cunningham pledged to never take corporate PAC money, and he never did. Mace said she’ll cash every check she gets. Yet another way this is a downgrade. https://t.co/DytsQXKXgU |
4 | False | What’s even more gross is that Mace takes corporate PAC money.\n\nShe’s already funded by corporations. Now she’s choosing to swindle working people on top of it.\n\nPeak scam artistry. Caps for cash 💰 https://t.co/CcVxgDF6id |
5 | False | Joe Cunningham already proving to be leagues more decent + honest than Mace seems capable of.\n\nThe House was far better off w/ Cunningham. It’s sad to see Mace diminish the representation of her community by launching a reputation of craven dishonesty right off the bat. |
6 | True | Pretty horrible.\n\nWell, it’s good to know what kind of person she is early. Also good to know that Mace is cut from the same Trump cloth of dishonesty and opportunism.\n\nSad to see a colleague intentionally hurt other women and survivors to make a buck. Thought she’d be better. https://t.co/CcVxgDF6id |
7 | False | RT @jaketapper: .@RepNancyMace fundraising off the false smear that @AOC misrepresented her experience during the insurrection. She didn’t.… |
8 | False | RT @RepMcGovern: One reason Washington can’t “come together” is because of people like her sending out emails like this.\n\nShe should apolog… |
9 | True | RT @JoeNeguse: Just to be clear, “targeting” stimulus checks means denying them to some working families who would otherwise receive them. |
10 | False | Amazon workers have the right to form a union.\n\nAnti-union tactics like these, especially from a trillion-dollar company trying to disrupt essential workers from organizing for better wages and dignified working conditions in a pandemic, are wrong. https://t.co/nTDqMUapYs |
11 | False | RT @WorkingFamilies: Voters elected Democrats to deliver more relief, not less. |
12 | False | We should preserve what was there and not peg it to outdated 2019 income. People need help! |
13 | False | If conservative Senate Dems institute a lower income threshold in the next round of checks, that could potentially mean the first round of checks under Trump help more people than the first round under Biden.\n\nDo we want to do that? No? Then let’s stop playing & just help people. |
14 | False | @iamjoshfitz 😂 call your member of Congress, they can help track it down |
15 | False | All Dems need for the slam dunk is to do what people elected us to do: help as many people as possible.\n\nIt’s not hard. Let’s not screw it up with austerity nonsense that squeezes the working class yet never makes a peep when tax cuts for yachts and private jets are proposed. |
16 | False | It should be $2000 to begin w/ anyway. Brutally means-testing a $1400 round is going to hurt so many people. THAT is the risk we can’t afford.\n\nIncome thresholds already work in reverse & lag behind reality. Conservative Dems can ask to tax $ back later if they’re so concerned. |
17 | False | We cannot cut off relief at $50k. It is shockingly out of touch to assert that $50k is “too wealthy” to receive relief.\n\nMillions are on the brink of eviction. Give too little and they’re devastated. Give “too much” and a single mom might save for a rainy day. This isn’t hard. https://t.co/o14r3phJeH |
18 | True | Imagine being a policymaker in Washington, having witnessed the massive economic, social, and health destruction over the last year, and think that the greatest policy risk we face is providing *too much* relief.\n\nSounds silly, right?\n\n$1.9T should be a floor, not a ceiling. |
19 | False | @AndrewYang @TweetBenMax @RitchieTorres Thanks @AndrewYang! Happy to chat about the plan details and the community effort that’s gone into this legislation. 🌃🌎 |
prompt = """
Is the following text self promoting? You should answer either Yes or No.
{text}
Answer:
"""
questions = [prompt.format_map(dict(text=t)) for t in tweets['full_text'].head(20)]
open_ai_answers2 = llm.batch(questions)
df2 = pd.DataFrame({"OpenAI": open_ai_answers2,
"Text": tweets['full_text'].head(20)})
# df2["OpenAI"] = df2["OpenAI"].str.contains("Y")
df2
OpenAI | Text | |
---|---|---|
0 | \nNo | RT @RepEscobar: Our country has the moral obligation and responsibility to reunite every single family separated at the southern border.\n\nT… |
1 | \nNo | RT @RoKhanna: What happens when we guarantee $15/hour?\n\n💰 31% of Black workers and 26% of Latinx workers get raises.\n😷 A majority of essent… |
2 | No | (Source: https://t.co/3o5JEr6zpd) |
3 | \nNo | Joe Cunningham pledged to never take corporate PAC money, and he never did. Mace said she’ll cash every check she gets. Yet another way this is a downgrade. https://t.co/DytsQXKXgU |
4 | No | What’s even more gross is that Mace takes corporate PAC money.\n\nShe’s already funded by corporations. Now she’s choosing to swindle working people on top of it.\n\nPeak scam artistry. Caps for cash 💰 https://t.co/CcVxgDF6id |
5 | \nYes | Joe Cunningham already proving to be leagues more decent + honest than Mace seems capable of.\n\nThe House was far better off w/ Cunningham. It’s sad to see Mace diminish the representation of her community by launching a reputation of craven dishonesty right off the bat. |
6 | \nNo | Pretty horrible.\n\nWell, it’s good to know what kind of person she is early. Also good to know that Mace is cut from the same Trump cloth of dishonesty and opportunism.\n\nSad to see a colleague intentionally hurt other women and survivors to make a buck. Thought she’d be better. https://t.co/CcVxgDF6id |
7 | \nNo | RT @jaketapper: .@RepNancyMace fundraising off the false smear that @AOC misrepresented her experience during the insurrection. She didn’t.… |
8 | \nYes | RT @RepMcGovern: One reason Washington can’t “come together” is because of people like her sending out emails like this.\n\nShe should apolog… |
9 | \nNo | RT @JoeNeguse: Just to be clear, “targeting” stimulus checks means denying them to some working families who would otherwise receive them. |
10 | \nNo | Amazon workers have the right to form a union.\n\nAnti-union tactics like these, especially from a trillion-dollar company trying to disrupt essential workers from organizing for better wages and dignified working conditions in a pandemic, are wrong. https://t.co/nTDqMUapYs |
11 | \nNo | RT @WorkingFamilies: Voters elected Democrats to deliver more relief, not less. |
12 | No | We should preserve what was there and not peg it to outdated 2019 income. People need help! |
13 | No | If conservative Senate Dems institute a lower income threshold in the next round of checks, that could potentially mean the first round of checks under Trump help more people than the first round under Biden.\n\nDo we want to do that? No? Then let’s stop playing & just help people. |
14 | No | @iamjoshfitz 😂 call your member of Congress, they can help track it down |
15 | \nYes | All Dems need for the slam dunk is to do what people elected us to do: help as many people as possible.\n\nIt’s not hard. Let’s not screw it up with austerity nonsense that squeezes the working class yet never makes a peep when tax cuts for yachts and private jets are proposed. |
16 | \nNo | It should be $2000 to begin w/ anyway. Brutally means-testing a $1400 round is going to hurt so many people. THAT is the risk we can’t afford.\n\nIncome thresholds already work in reverse & lag behind reality. Conservative Dems can ask to tax $ back later if they’re so concerned. |
17 | \nNo | We cannot cut off relief at $50k. It is shockingly out of touch to assert that $50k is “too wealthy” to receive relief.\n\nMillions are on the brink of eviction. Give too little and they’re devastated. Give “too much” and a single mom might save for a rainy day. This isn’t hard. https://t.co/o14r3phJeH |
18 | \nNo | Imagine being a policymaker in Washington, having witnessed the massive economic, social, and health destruction over the last year, and think that the greatest policy risk we face is providing *too much* relief.\n\nSounds silly, right?\n\n$1.9T should be a floor, not a ceiling. |
19 | \nYes | @AndrewYang @TweetBenMax @RitchieTorres Thanks @AndrewYang! Happy to chat about the plan details and the community effort that’s gone into this legislation. 🌃🌎 |