Lecture 24 – Data 100, Fall 2023¶

Data 100, Fall 2023

Getting Setup¶

You can run this notebook on the Jupyter Hub machines but you will need to setup an OpenAI account. Alternatively, if you are running on your own computer you can also try to run a model locally.

Step 1. Create an OpenAI account¶

You can create a free account which has some initial free credits by going here:

https://platform.openai.com

You will the need to get an API Key. Save that api key to a local file called openai.key:

In [17]:
with open("openai.key", "w") as f:
    f.write("Your API Key")

Step 2. Install Python Tools¶

Uncomment the following line.

In [18]:
# !pip install openai langchain

Using OpenAI with LangChain¶

In [19]:
from langchain.llms import OpenAI
In [20]:
openai_key = open("openai.key", "r").readline()
llm = OpenAI(openai_api_key=openai_key,
             model="gpt-3.5-turbo-instruct", 
             #temperature=0, 
             max_tokens=512)
In [21]:
llm("What is the capital of California?")
Out[21]:
'\n\nThe capital of California is Sacramento.'
In [22]:
for chunk in llm.stream("Write a short song about data science and large language models."):
    print(chunk, end="", flush=True)
Verse 1:
Data science, it's the future we embrace
Using numbers to unlock the human race
From predictions to insights, we see it all
But now we've got something new, standing tall

Chorus:
Large language models, they're here to stay
With endless knowledge, they'll light the way
From GPT-3 to BERT, they're the real deal
In data science, they're the ultimate reveal

Verse 2:
With billions of parameters, they're learning fast
Analyzing text, from present to the past
They can generate, translate, and summarize
Making our tasks easier, no surprise

Chorus:
Large language models, they're here to stay
With endless knowledge, they'll light the way
From GPT-3 to BERT, they're the real deal
In data science, they're the ultimate reveal

Bridge:
But with great power, comes great responsibility
We must use them wisely, with integrity
For the future of AI, lies in our hands
Let's harness their potential, and make great plans

Chorus:
Large language models, they're here to stay
With endless knowledge, they'll light the way
From GPT-3 to BERT, they're the real deal
In data science, they're the ultimate reveal

Outro:
So let's embrace data science, and these models too
Together we can do things, we never thought we'd do
With endless possibilities, the future's bright
Data science and large language models, shining light.

Running Locally with Ollama and LangChain¶

You can download and install Ollama from:

https://ollama.ai/download

This will run models locally

In [23]:
# from langchain.llms import Ollama
# from langchain.callbacks.manager import CallbackManager
# from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
# vicuna = Ollama(
#     model="vicuna", 
#     #    temperature=0,
#     callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])
# )
In [24]:
# vicuna("What is the capital of California? Answer with only one word.")
In [25]:
# vicuna("Write a short song about data science and large language models.")






Data Analytics¶

We can use LLMs to help in analyzing data

In [26]:
import pandas as pd
tweets = pd.read_json("AOC_recent_tweets.txt")
list(tweets['full_text'][0:10])
Out[26]:
['RT @RepEscobar: Our country has the moral obligation and responsibility to reunite every single family separated at the southern border.\n\nT…',
 'RT @RoKhanna: What happens when we guarantee $15/hour?\n\n💰 31% of Black workers and 26% of Latinx workers get raises.\n😷 A majority of essent…',
 '(Source: https://t.co/3o5JEr6zpd)',
 'Joe Cunningham pledged to never take corporate PAC money, and he never did. Mace said she’ll cash every check she gets. Yet another way this is a downgrade. https://t.co/DytsQXKXgU',
 'What’s even more gross is that Mace takes corporate PAC money.\n\nShe’s already funded by corporations. Now she’s choosing to swindle working people on top of it.\n\nPeak scam artistry. Caps for cash 💰 https://t.co/CcVxgDF6id',
 'Joe Cunningham already proving to be leagues more decent + honest than Mace seems capable of.\n\nThe House was far better off w/ Cunningham. It’s sad to see Mace diminish the representation of her community by launching a reputation of craven dishonesty right off the bat.',
 'Pretty horrible.\n\nWell, it’s good to know what kind of person she is early. Also good to know that Mace is cut from the same Trump cloth of dishonesty and opportunism.\n\nSad to see a colleague intentionally hurt other women and survivors to make a buck. Thought she’d be better. https://t.co/CcVxgDF6id',
 'RT @jaketapper: .@RepNancyMace fundraising off the false smear that @AOC misrepresented her experience during the insurrection. She didn’t.…',
 'RT @RepMcGovern: One reason Washington can’t “come together” is because of people like her sending out emails like this.\n\nShe should apolog…',
 'RT @JoeNeguse: Just to be clear, “targeting” stimulus checks means denying them to some working families who would otherwise receive them.']




Suppose I wanted to evaluate whether a tweet is attacking someone

In [27]:
prompt = """
Is the following text making a statement about minimum wage? You should answer either Yes or No.

{text}

Answer:
"""
questions = [prompt.format_map(dict(text=t)) for t in tweets['full_text'].head(20)]

Ask each of the LLMs to answer the questions:

In [28]:
open_ai_answers = llm.batch(questions)
open_ai_answers
Out[28]:
['\nYes',
 '\nYes',
 '\nNo',
 '\nNo',
 '\nNo',
 '\nNo',
 'Yes',
 '\nNo',
 '\nNo',
 '\nYes',
 '\nNo',
 '\nNo',
 '\nNo',
 'No',
 'No',
 '\nNo',
 'No',
 '\nNo',
 '\nYes',
 'No ']
In [29]:
# vicuna_answers = vicuna.batch(questions)
# vicuna_answers
In [30]:
pd.set_option('display.max_colwidth', None)
df = pd.DataFrame({"OpenAI": open_ai_answers, 
                   # "Vicuna": vicuna_answers,
                   "Text": tweets['full_text'].head(20)})
df["OpenAI"] = df["OpenAI"].str.contains("Y")
# df["Vicuna"] = df["Vicuna"].str.contains("Y")
df
Out[30]:
OpenAI Text
0 True RT @RepEscobar: Our country has the moral obligation and responsibility to reunite every single family separated at the southern border.\n\nT…
1 True RT @RoKhanna: What happens when we guarantee $15/hour?\n\n💰 31% of Black workers and 26% of Latinx workers get raises.\n😷 A majority of essent…
2 False (Source: https://t.co/3o5JEr6zpd)
3 False Joe Cunningham pledged to never take corporate PAC money, and he never did. Mace said she’ll cash every check she gets. Yet another way this is a downgrade. https://t.co/DytsQXKXgU
4 False What’s even more gross is that Mace takes corporate PAC money.\n\nShe’s already funded by corporations. Now she’s choosing to swindle working people on top of it.\n\nPeak scam artistry. Caps for cash 💰 https://t.co/CcVxgDF6id
5 False Joe Cunningham already proving to be leagues more decent + honest than Mace seems capable of.\n\nThe House was far better off w/ Cunningham. It’s sad to see Mace diminish the representation of her community by launching a reputation of craven dishonesty right off the bat.
6 True Pretty horrible.\n\nWell, it’s good to know what kind of person she is early. Also good to know that Mace is cut from the same Trump cloth of dishonesty and opportunism.\n\nSad to see a colleague intentionally hurt other women and survivors to make a buck. Thought she’d be better. https://t.co/CcVxgDF6id
7 False RT @jaketapper: .@RepNancyMace fundraising off the false smear that @AOC misrepresented her experience during the insurrection. She didn’t.…
8 False RT @RepMcGovern: One reason Washington can’t “come together” is because of people like her sending out emails like this.\n\nShe should apolog…
9 True RT @JoeNeguse: Just to be clear, “targeting” stimulus checks means denying them to some working families who would otherwise receive them.
10 False Amazon workers have the right to form a union.\n\nAnti-union tactics like these, especially from a trillion-dollar company trying to disrupt essential workers from organizing for better wages and dignified working conditions in a pandemic, are wrong. https://t.co/nTDqMUapYs
11 False RT @WorkingFamilies: Voters elected Democrats to deliver more relief, not less.
12 False We should preserve what was there and not peg it to outdated 2019 income. People need help!
13 False If conservative Senate Dems institute a lower income threshold in the next round of checks, that could potentially mean the first round of checks under Trump help more people than the first round under Biden.\n\nDo we want to do that? No? Then let’s stop playing & just help people.
14 False @iamjoshfitz 😂 call your member of Congress, they can help track it down
15 False All Dems need for the slam dunk is to do what people elected us to do: help as many people as possible.\n\nIt’s not hard. Let’s not screw it up with austerity nonsense that squeezes the working class yet never makes a peep when tax cuts for yachts and private jets are proposed.
16 False It should be $2000 to begin w/ anyway. Brutally means-testing a $1400 round is going to hurt so many people. THAT is the risk we can’t afford.\n\nIncome thresholds already work in reverse & lag behind reality. Conservative Dems can ask to tax $ back later if they’re so concerned.
17 False We cannot cut off relief at $50k. It is shockingly out of touch to assert that $50k is “too wealthy” to receive relief.\n\nMillions are on the brink of eviction. Give too little and they’re devastated. Give “too much” and a single mom might save for a rainy day. This isn’t hard. https://t.co/o14r3phJeH
18 True Imagine being a policymaker in Washington, having witnessed the massive economic, social, and health destruction over the last year, and think that the greatest policy risk we face is providing *too much* relief.\n\nSounds silly, right?\n\n$1.9T should be a floor, not a ceiling.
19 False @AndrewYang @TweetBenMax @RitchieTorres Thanks @AndrewYang! Happy to chat about the plan details and the community effort that’s gone into this legislation. 🌃🌎
In [31]:
prompt = """
Is the following text self promoting? You should answer either Yes or No.

{text}

Answer:
"""
questions = [prompt.format_map(dict(text=t)) for t in tweets['full_text'].head(20)]
open_ai_answers2 = llm.batch(questions)
df2 = pd.DataFrame({"OpenAI": open_ai_answers2, 
                   "Text": tweets['full_text'].head(20)})
# df2["OpenAI"] = df2["OpenAI"].str.contains("Y")
df2
Out[31]:
OpenAI Text
0 \nNo RT @RepEscobar: Our country has the moral obligation and responsibility to reunite every single family separated at the southern border.\n\nT…
1 \nNo RT @RoKhanna: What happens when we guarantee $15/hour?\n\n💰 31% of Black workers and 26% of Latinx workers get raises.\n😷 A majority of essent…
2 No (Source: https://t.co/3o5JEr6zpd)
3 \nNo Joe Cunningham pledged to never take corporate PAC money, and he never did. Mace said she’ll cash every check she gets. Yet another way this is a downgrade. https://t.co/DytsQXKXgU
4 No What’s even more gross is that Mace takes corporate PAC money.\n\nShe’s already funded by corporations. Now she’s choosing to swindle working people on top of it.\n\nPeak scam artistry. Caps for cash 💰 https://t.co/CcVxgDF6id
5 \nYes Joe Cunningham already proving to be leagues more decent + honest than Mace seems capable of.\n\nThe House was far better off w/ Cunningham. It’s sad to see Mace diminish the representation of her community by launching a reputation of craven dishonesty right off the bat.
6 \nNo Pretty horrible.\n\nWell, it’s good to know what kind of person she is early. Also good to know that Mace is cut from the same Trump cloth of dishonesty and opportunism.\n\nSad to see a colleague intentionally hurt other women and survivors to make a buck. Thought she’d be better. https://t.co/CcVxgDF6id
7 \nNo RT @jaketapper: .@RepNancyMace fundraising off the false smear that @AOC misrepresented her experience during the insurrection. She didn’t.…
8 \nYes RT @RepMcGovern: One reason Washington can’t “come together” is because of people like her sending out emails like this.\n\nShe should apolog…
9 \nNo RT @JoeNeguse: Just to be clear, “targeting” stimulus checks means denying them to some working families who would otherwise receive them.
10 \nNo Amazon workers have the right to form a union.\n\nAnti-union tactics like these, especially from a trillion-dollar company trying to disrupt essential workers from organizing for better wages and dignified working conditions in a pandemic, are wrong. https://t.co/nTDqMUapYs
11 \nNo RT @WorkingFamilies: Voters elected Democrats to deliver more relief, not less.
12 No We should preserve what was there and not peg it to outdated 2019 income. People need help!
13 No If conservative Senate Dems institute a lower income threshold in the next round of checks, that could potentially mean the first round of checks under Trump help more people than the first round under Biden.\n\nDo we want to do that? No? Then let’s stop playing & just help people.
14 No @iamjoshfitz 😂 call your member of Congress, they can help track it down
15 \nYes All Dems need for the slam dunk is to do what people elected us to do: help as many people as possible.\n\nIt’s not hard. Let’s not screw it up with austerity nonsense that squeezes the working class yet never makes a peep when tax cuts for yachts and private jets are proposed.
16 \nNo It should be $2000 to begin w/ anyway. Brutally means-testing a $1400 round is going to hurt so many people. THAT is the risk we can’t afford.\n\nIncome thresholds already work in reverse & lag behind reality. Conservative Dems can ask to tax $ back later if they’re so concerned.
17 \nNo We cannot cut off relief at $50k. It is shockingly out of touch to assert that $50k is “too wealthy” to receive relief.\n\nMillions are on the brink of eviction. Give too little and they’re devastated. Give “too much” and a single mom might save for a rainy day. This isn’t hard. https://t.co/o14r3phJeH
18 \nNo Imagine being a policymaker in Washington, having witnessed the massive economic, social, and health destruction over the last year, and think that the greatest policy risk we face is providing *too much* relief.\n\nSounds silly, right?\n\n$1.9T should be a floor, not a ceiling.
19 \nYes @AndrewYang @TweetBenMax @RitchieTorres Thanks @AndrewYang! Happy to chat about the plan details and the community effort that’s gone into this legislation. 🌃🌎