# with open("openai.key", "w") as f:
#     f.write("YOUR KEY")

!pip install -U openai langchain langchain-openai

Collecting openai
  Downloading openai-1.76.2-py3-none-any.whl.metadata (25 kB)
Collecting langchain
  Downloading langchain-0.3.24-py3-none-any.whl.metadata (7.8 kB)
Collecting langchain-openai
  Downloading langchain_openai-0.3.15-py3-none-any.whl.metadata (2.3 kB)
Requirement already satisfied: anyio<5,>=3.5.0 in /srv/conda/envs/notebook/lib/python3.11/site-packages (from openai) (4.3.0)
Collecting distro<2,>=1.7.0 (from openai)
  Downloading distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Requirement already satisfied: httpx<1,>=0.23.0 in /srv/conda/envs/notebook/lib/python3.11/site-packages (from openai) (0.27.0)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.2 kB)
Requirement already satisfied: pydantic<3,>=1.9.0 in /srv/conda/envs/notebook/lib/python3.11/site-packages (from openai) (2.9.2)
Requirement already satisfied: sniffio in /srv/conda/envs/notebook/lib/python3.11/site-packages (from openai) (1.3.1)
Requirement already satisfied: tqdm>4 in /srv/conda/envs/notebook/lib/python3.11/site-packages (from openai) (4.67.1)
Collecting typing-extensions<5,>=4.11 (from openai)
  Downloading typing_extensions-4.13.2-py3-none-any.whl.metadata (3.0 kB)
Collecting langchain-core<1.0.0,>=0.3.55 (from langchain)
  Downloading langchain_core-0.3.56-py3-none-any.whl.metadata (5.9 kB)
Collecting langchain-text-splitters<1.0.0,>=0.3.8 (from langchain)
  Downloading langchain_text_splitters-0.3.8-py3-none-any.whl.metadata (1.9 kB)
Collecting langsmith<0.4,>=0.1.17 (from langchain)
  Downloading langsmith-0.3.39-py3-none-any.whl.metadata (15 kB)
Requirement already satisfied: SQLAlchemy<3,>=1.4 in /srv/conda/envs/notebook/lib/python3.11/site-packages (from langchain) (2.0.16)
Requirement already satisfied: requests<3,>=2 in /srv/conda/envs/notebook/lib/python3.11/site-packages (from langchain) (2.32.3)
Requirement already satisfied: PyYAML>=5.3 in /srv/conda/envs/notebook/lib/python3.11/site-packages (from langchain) (6.0.1)
Collecting tiktoken<1,>=0.7 (from langchain-openai)
  Downloading tiktoken-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Requirement already satisfied: idna>=2.8 in /srv/conda/envs/notebook/lib/python3.11/site-packages (from anyio<5,>=3.5.0->openai) (3.6)
Requirement already satisfied: certifi in /srv/conda/envs/notebook/lib/python3.11/site-packages (from httpx<1,>=0.23.0->openai) (2025.1.31)
Requirement already satisfied: httpcore==1.* in /srv/conda/envs/notebook/lib/python3.11/site-packages (from httpx<1,>=0.23.0->openai) (1.0.5)
Requirement already satisfied: h11<0.15,>=0.13 in /srv/conda/envs/notebook/lib/python3.11/site-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai) (0.14.0)
Requirement already satisfied: tenacity!=8.4.0,<10.0.0,>=8.1.0 in /srv/conda/envs/notebook/lib/python3.11/site-packages (from langchain-core<1.0.0,>=0.3.55->langchain) (9.1.2)
Collecting jsonpatch<2.0,>=1.33 (from langchain-core<1.0.0,>=0.3.55->langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Requirement already satisfied: packaging<25,>=23.2 in /srv/conda/envs/notebook/lib/python3.11/site-packages (from langchain-core<1.0.0,>=0.3.55->langchain) (24.0)
Collecting orjson<4.0.0,>=3.9.14 (from langsmith<0.4,>=0.1.17->langchain)
  Downloading orjson-3.10.18-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (41 kB)
Collecting requests-toolbelt<2.0.0,>=1.0.0 (from langsmith<0.4,>=0.1.17->langchain)
  Downloading requests_toolbelt-1.0.0-py2.py3-none-any.whl.metadata (14 kB)
Collecting zstandard<0.24.0,>=0.23.0 (from langsmith<0.4,>=0.1.17->langchain)
  Downloading zstandard-0.23.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Requirement already satisfied: annotated-types>=0.6.0 in /srv/conda/envs/notebook/lib/python3.11/site-packages (from pydantic<3,>=1.9.0->openai) (0.7.0)
Requirement already satisfied: pydantic-core==2.23.4 in /srv/conda/envs/notebook/lib/python3.11/site-packages (from pydantic<3,>=1.9.0->openai) (2.23.4)
Requirement already satisfied: charset_normalizer<4,>=2 in /srv/conda/envs/notebook/lib/python3.11/site-packages (from requests<3,>=2->langchain) (3.3.2)
Requirement already satisfied: urllib3<3,>=1.21.1 in /srv/conda/envs/notebook/lib/python3.11/site-packages (from requests<3,>=2->langchain) (2.2.1)
Requirement already satisfied: greenlet!=0.4.17 in /srv/conda/envs/notebook/lib/python3.11/site-packages (from SQLAlchemy<3,>=1.4->langchain) (3.2.0)
Requirement already satisfied: regex>=2022.1.18 in /srv/conda/envs/notebook/lib/python3.11/site-packages (from tiktoken<1,>=0.7->langchain-openai) (2024.11.6)
Requirement already satisfied: jsonpointer>=1.9 in /srv/conda/envs/notebook/lib/python3.11/site-packages (from jsonpatch<2.0,>=1.33->langchain-core<1.0.0,>=0.3.55->langchain) (2.4)
Downloading openai-1.76.2-py3-none-any.whl (661 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 661.3/661.3 kB 8.8 MB/s eta 0:00:00
Downloading langchain-0.3.24-py3-none-any.whl (1.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 16.4 MB/s eta 0:00:00
Downloading langchain_openai-0.3.15-py3-none-any.whl (62 kB)
Downloading distro-1.9.0-py3-none-any.whl (20 kB)
Downloading jiter-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (351 kB)
Downloading langchain_core-0.3.56-py3-none-any.whl (437 kB)
Downloading langchain_text_splitters-0.3.8-py3-none-any.whl (32 kB)
Downloading langsmith-0.3.39-py3-none-any.whl (359 kB)
Downloading tiktoken-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 22.1 MB/s eta 0:00:00
Downloading typing_extensions-4.13.2-py3-none-any.whl (45 kB)
Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Downloading orjson-3.10.18-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (132 kB)
Downloading requests_toolbelt-1.0.0-py2.py3-none-any.whl (54 kB)
Downloading zstandard-0.23.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.4/5.4 MB 59.6 MB/s eta 0:00:00
Installing collected packages: zstandard, typing-extensions, orjson, jsonpatch, jiter, distro, tiktoken, requests-toolbelt, openai, langsmith, langchain-core, langchain-text-splitters, langchain-openai, langchain
  Attempting uninstall: typing-extensions
    Found existing installation: typing_extensions 4.10.0
    Uninstalling typing_extensions-4.10.0:
      Successfully uninstalled typing_extensions-4.10.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torch 2.5.1 requires sympy==1.13.1; python_version >= "3.9", but you have sympy 1.13.3 which is incompatible.
Successfully installed distro-1.9.0 jiter-0.9.0 jsonpatch-1.33 langchain-0.3.24 langchain-core-0.3.56 langchain-openai-0.3.15 langchain-text-splitters-0.3.8 langsmith-0.3.39 openai-1.76.2 orjson-3.10.18 requests-toolbelt-1.0.0 tiktoken-0.9.0 typing-extensions-4.13.2 zstandard-0.23.0

from langchain_openai import OpenAI
import pandas as pd

openai_key = open("openai.key", "r").readline()
llm = OpenAI(openai_api_key=openai_key,
             model_name="gpt-3.5-turbo-instruct")

llm.invoke("What is the capital of California? Provide a short answer.")

'\n\nSacramento. '

for chunk in llm.stream("Write a short song about data science and large language models."):
    print(chunk, end="", flush=True)


Verse 1:
Data science, the magic of numbers
Uncovering secrets, we all wonder
From mountains of data, we find the truth
And with each discovery, we gain new proof

Chorus:
Data science, it's the key
To unlock mysteries, we can't see
With large language models, we can explore
The depths of knowledge, never seen before

Verse 2:
Language models, a powerful tool
Processing words, making sense of it all
From speech to text, they can understand
And help us communicate, in a whole new land

Chorus:
Data science, it's the key
To unlock mysteries, we can't see
With large language models, we can explore
The depths of knowledge, never seen before

Bridge:
 of data, we learn more
And with each model, we open new doors
The possibilities, they seem endless
 large language models, we are fearless

Chorus:
Data science, it's the key
To unlock mysteries, we can't see
With large language models, we can explore
The depths of knowledge, never seen before

Outro:
's embrace this world of data and code
And together, let's unlock

tweets = pd.read_json("AOC_recent_tweets.txt")
list(tweets['full_text'][0:10])

['RT @RepEscobar: Our country has the moral obligation and responsibility to reunite every single family separated at the southern border.\n\nT…',
 'RT @RoKhanna: What happens when we guarantee $15/hour?\n\n💰 31% of Black workers and 26% of Latinx workers get raises.\n😷 A majority of essent…',
 '(Source: https://t.co/3o5JEr6zpd)',
 'Joe Cunningham pledged to never take corporate PAC money, and he never did. Mace said she’ll cash every check she gets. Yet another way this is a downgrade. https://t.co/DytsQXKXgU',
 'What’s even more gross is that Mace takes corporate PAC money.\n\nShe’s already funded by corporations. Now she’s choosing to swindle working people on top of it.\n\nPeak scam artistry. Caps for cash 💰 https://t.co/CcVxgDF6id',
 'Joe Cunningham already proving to be leagues more decent + honest than Mace seems capable of.\n\nThe House was far better off w/ Cunningham. It’s sad to see Mace diminish the representation of her community by launching a reputation of craven dishonesty right off the bat.',
 'Pretty horrible.\n\nWell, it’s good to know what kind of person she is early. Also good to know that Mace is cut from the same Trump cloth of dishonesty and opportunism.\n\nSad to see a colleague intentionally hurt other women and survivors to make a buck. Thought she’d be better. https://t.co/CcVxgDF6id',
 'RT @jaketapper: .@RepNancyMace fundraising off the false smear that @AOC misrepresented her experience during the insurrection. She didn’t.…',
 'RT @RepMcGovern: One reason Washington can’t “come together” is because of people like her sending out emails like this.\n\nShe should apolog…',
 'RT @JoeNeguse: Just to be clear, “targeting” stimulus checks means denying them to some working families who would otherwise receive them.']

prompt = """
Is the following text making a statement about minimum wage? You should answer either Yes or No.

{text}

Answer:
"""
questions = [prompt.format_map(dict(text=t)) for t in tweets['full_text'].head(20)]

open_ai_answers = llm.batch(questions)
open_ai_answers

['No ',
 '\nYes',
 '\nNo',
 '\nNo',
 '\nNo',
 '\nNo',
 '\nNo',
 '\nNo',
 '\nNo',
 '\nYes',
 '\nYes',
 '\nNo',
 '\nNo',
 '\nYes',
 '\nNo',
 '\nYes',
 '\nNo',
 '\nNo',
 '\nYes',
 '\nNo']

pd.set_option('display.max_colwidth', None)
df = pd.DataFrame({"OpenAI": open_ai_answers, 
                   "Text": tweets['full_text'].head(20)})
df["OpenAI"] = df["OpenAI"].str.contains("Y")
df

!pip install -q -U google-generativeai

# with open("gemini_key.txt", "w") as f:
#     f.write("YOUR KEY")

GEMINI_API_KEY = None
if not GEMINI_API_KEY:
    with open("gemini_key.txt", "r") as f:
        GEMINI_API_KEY = f.read().strip()

import google.generativeai as genai
genai.configure(api_key=GEMINI_API_KEY)

models_df = pd.DataFrame(genai.list_models())
models_df

from IPython.display import Markdown
display(Markdown(models_df[models_df["name"] == "models/gemini-2.5-flash-preview-04-17"]['description'].values[0]))

model = genai.GenerativeModel("gemini-2.5-flash-preview-04-17")

response = model.generate_content("Why is Data 100 great?")
Markdown(response.text)

from IPython.display import Image
from IPython.core.display import HTML
img = Image("data100_logo.png", width=200, height=200)
img

response = model.generate_content([
    """What is going on in this picture I downloaded from 
    the Berkeley Data100 Course Website? 
    How does it related to Data Science""", img])
Markdown(response.text)

from IPython.display import clear_output

response = model.generate_content("Write a poem about Data Science.", stream=True)

output = ""
for chunk in response:
    output += chunk.text
    clear_output(wait=True)
    display(Markdown(output))

df = pd.read_html("https://en.wikipedia.org/wiki/List_of_colleges_and_universities_in_California")[1]
df

fast_model = genai.GenerativeModel("gemini-1.5-flash-8b")

prompt = "What is the mascot of {school}? Answer by only providing the mascot."
df['mascot'] = df['Name'].apply(
    lambda x: fast_model.generate_content(prompt.format(school=x)).text)
df

from langchain_openai import OpenAI
openai_key = open("openai.key", "r").readline()
client = OpenAI(openai_api_key=openai_key,
             model_name="gpt-3.5-turbo-instruct")

# Simulating student feedback data
feedback_data = {
    'StudentID': [1, 2, 3, 4, 5],
    'Feedback': [
        'Great class, learned a lot! But I really did not like PCA.',
        'The course was very informative and well-structured. Would prefer if lectures went faster. ',
        'I found the assignments challenging but rewarding. But the midterm was brutal.',
        'The lectures were engaging and the instructor was very knowledgeable.',
        'I struggled with the linear algebra. I would recommend this class to anyone interested in data science.'
    ],
    'Rating': [5, 4, 4, 5, 5]
}
feedback_df = pd.DataFrame(feedback_data)
feedback_df

output_schema = {
        "type": "json_schema",
        "json_schema": {
            "name": "issue_schema",
            "schema": {
                "type": "object",
                "properties": {
                    "Issue": {
                        "description": "Any issues or concerns the user raised about the class.",
                        "type": "string"
                    },
                    "Liked": {
                        "description": "Any things the user liked about the class.",
                        "type": "string"
                    },
                    "additionalProperties": False
                }
            }
        }
    }

def process_feedback(feedback):
    prompt = f"""Extract the following information in JSON format:
    {{
  "Issue": "Any issues or concerns the user raised about the class.",
  "Liked": "Any things the user liked about the class."
  }}

  Feedback: "{feedback}"
"""
    response = client.invoke(prompt)
    import re, json
    try:
        json_match = re.search(r"\{.*\}", response, re.DOTALL)
        return json.loads(json_match.group(0)) if json_match else {"Issue": "", "Liked": ""}
    except:
        return {"Issue": "", "Liked": ""}

responses = feedback_df["Feedback"].apply(process_feedback)
responses

0                                                     {'Issue': 'I really did not like PCA.', 'Liked': 'Great class, learned a lot!'}
1                 {'Issue': 'Would prefer if lectures went faster.', 'Liked': 'The course was very informative and well-structured.'}
2                                 {'Issue': 'The midterm was brutal.', 'Liked': 'I found the assignments challenging but rewarding.'}
3                                   {'Issue': None, 'Liked': 'The lectures were engaging and the instructor was very knowledgeable.'}
4    {'Issue': 'I struggled with the linear algebra.', 'Liked': 'I would recommend this class to anyone interested in data science.'}
Name: Feedback, dtype: object

pd.set_option('display.max_colwidth', None)
feedback_df.join(pd.DataFrame(responses.to_list()))

	name	version	display_name	description	input_token_limit	output_token_limit	supported_generation_methods	temperature	max_temperature	top_p	top_k
0	models/chat-bison-001	001	PaLM 2 Chat (Legacy)	A legacy text-only model optimized for chat conversations	4096	1024	[generateMessage, countMessageTokens]	0.25	NaN	0.95	40.0
1	models/text-bison-001	001	PaLM 2 (Legacy)	A legacy model that understands text and generates text as an output	8196	1024	[generateText, countTextTokens, createTunedTextModel]	0.70	NaN	0.95	40.0
2	models/embedding-gecko-001	001	Embedding Gecko	Obtain a distributed representation of a text.	1024	1	[embedText, countTextTokens]	NaN	NaN	NaN	NaN
3	models/gemini-1.0-pro-vision-latest	001	Gemini 1.0 Pro Vision	The original Gemini 1.0 Pro Vision model version which was optimized for image understanding. Gemini 1.0 Pro Vision was deprecated on July 12, 2024. Move to a newer Gemini version.	12288	4096	[generateContent, countTokens]	0.40	NaN	1.00	32.0
4	models/gemini-pro-vision	001	Gemini 1.0 Pro Vision	The original Gemini 1.0 Pro Vision model version which was optimized for image understanding. Gemini 1.0 Pro Vision was deprecated on July 12, 2024. Move to a newer Gemini version.	12288	4096	[generateContent, countTokens]	0.40	NaN	1.00	32.0
5	models/gemini-1.5-pro-latest	001	Gemini 1.5 Pro Latest	Alias that points to the most recent production (non-experimental) release of Gemini 1.5 Pro, our mid-size multimodal model that supports up to 2 million tokens.	2000000	8192	[generateContent, countTokens]	1.00	2.0	0.95	40.0
6	models/gemini-1.5-pro-001	001	Gemini 1.5 Pro 001	Stable version of Gemini 1.5 Pro, our mid-size multimodal model that supports up to 2 million tokens, released in May of 2024.	2000000	8192	[generateContent, countTokens, createCachedContent]	1.00	2.0	0.95	64.0
7	models/gemini-1.5-pro-002	002	Gemini 1.5 Pro 002	Stable version of Gemini 1.5 Pro, our mid-size multimodal model that supports up to 2 million tokens, released in September of 2024.	2000000	8192	[generateContent, countTokens, createCachedContent]	1.00	2.0	0.95	40.0
8	models/gemini-1.5-pro	001	Gemini 1.5 Pro	Stable version of Gemini 1.5 Pro, our mid-size multimodal model that supports up to 2 million tokens, released in May of 2024.	2000000	8192	[generateContent, countTokens]	1.00	2.0	0.95	40.0
9	models/gemini-1.5-flash-latest	001	Gemini 1.5 Flash Latest	Alias that points to the most recent production (non-experimental) release of Gemini 1.5 Flash, our fast and versatile multimodal model for scaling across diverse tasks.	1000000	8192	[generateContent, countTokens]	1.00	2.0	0.95	40.0
10	models/gemini-1.5-flash-001	001	Gemini 1.5 Flash 001	Stable version of Gemini 1.5 Flash, our fast and versatile multimodal model for scaling across diverse tasks, released in May of 2024.	1000000	8192	[generateContent, countTokens, createCachedContent]	1.00	2.0	0.95	64.0
11	models/gemini-1.5-flash-001-tuning	001	Gemini 1.5 Flash 001 Tuning	Version of Gemini 1.5 Flash that supports tuning, our fast and versatile multimodal model for scaling across diverse tasks, released in May of 2024.	16384	8192	[generateContent, countTokens, createTunedModel]	1.00	2.0	0.95	64.0
12	models/gemini-1.5-flash	001	Gemini 1.5 Flash	Alias that points to the most recent stable version of Gemini 1.5 Flash, our fast and versatile multimodal model for scaling across diverse tasks.	1000000	8192	[generateContent, countTokens]	1.00	2.0	0.95	40.0
13	models/gemini-1.5-flash-002	002	Gemini 1.5 Flash 002	Stable version of Gemini 1.5 Flash, our fast and versatile multimodal model for scaling across diverse tasks, released in September of 2024.	1000000	8192	[generateContent, countTokens, createCachedContent]	1.00	2.0	0.95	40.0
14	models/gemini-1.5-flash-8b	001	Gemini 1.5 Flash-8B	Stable version of Gemini 1.5 Flash-8B, our smallest and most cost effective Flash model, released in October of 2024.	1000000	8192	[createCachedContent, generateContent, countTokens]	1.00	2.0	0.95	40.0
15	models/gemini-1.5-flash-8b-001	001	Gemini 1.5 Flash-8B 001	Stable version of Gemini 1.5 Flash-8B, our smallest and most cost effective Flash model, released in October of 2024.	1000000	8192	[createCachedContent, generateContent, countTokens]	1.00	2.0	0.95	40.0
16	models/gemini-1.5-flash-8b-latest	001	Gemini 1.5 Flash-8B Latest	Alias that points to the most recent production (non-experimental) release of Gemini 1.5 Flash-8B, our smallest and most cost effective Flash model, released in October of 2024.	1000000	8192	[createCachedContent, generateContent, countTokens]	1.00	2.0	0.95	40.0
17	models/gemini-1.5-flash-8b-exp-0827	001	Gemini 1.5 Flash 8B Experimental 0827	Experimental release (August 27th, 2024) of Gemini 1.5 Flash-8B, our smallest and most cost effective Flash model. Replaced by Gemini-1.5-flash-8b-001 (stable).	1000000	8192	[generateContent, countTokens]	1.00	2.0	0.95	40.0
18	models/gemini-1.5-flash-8b-exp-0924	001	Gemini 1.5 Flash 8B Experimental 0924	Experimental release (September 24th, 2024) of Gemini 1.5 Flash-8B, our smallest and most cost effective Flash model. Replaced by Gemini-1.5-flash-8b-001 (stable).	1000000	8192	[generateContent, countTokens]	1.00	2.0	0.95	40.0
19	models/gemini-2.5-pro-exp-03-25	2.5-exp-03-25	Gemini 2.5 Pro Experimental 03-25	Experimental release (March 25th, 2025) of Gemini 2.5 Pro	1048576	65536	[generateContent, countTokens, createCachedContent]	1.00	2.0	0.95	64.0
20	models/gemini-2.5-pro-preview-03-25	2.5-preview-03-25	Gemini 2.5 Pro Preview 03-25	Gemini 2.5 Pro Preview 03-25	1048576	65536	[generateContent, countTokens, createCachedContent]	1.00	2.0	0.95	64.0
21	models/gemini-2.5-flash-preview-04-17	2.5-preview-04-17	Gemini 2.5 Flash Preview 04-17	Preview release (April 17th, 2025) of Gemini 2.5 Flash	1048576	65536	[generateContent, countTokens, createCachedContent]	1.00	2.0	0.95	64.0
22	models/gemini-2.0-flash-exp	2.0	Gemini 2.0 Flash Experimental	Gemini 2.0 Flash Experimental	1048576	8192	[generateContent, countTokens, bidiGenerateContent]	1.00	2.0	0.95	40.0
23	models/gemini-2.0-flash	2.0	Gemini 2.0 Flash	Gemini 2.0 Flash	1048576	8192	[generateContent, countTokens, createCachedContent]	1.00	2.0	0.95	40.0
24	models/gemini-2.0-flash-001	2.0	Gemini 2.0 Flash 001	Stable version of Gemini 2.0 Flash, our fast and versatile multimodal model for scaling across diverse tasks, released in January of 2025.	1048576	8192	[generateContent, countTokens, createCachedContent]	1.00	2.0	0.95	40.0
25	models/gemini-2.0-flash-exp-image-generation	2.0	Gemini 2.0 Flash (Image Generation) Experimental	Gemini 2.0 Flash (Image Generation) Experimental	1048576	8192	[generateContent, countTokens, bidiGenerateContent]	1.00	2.0	0.95	40.0
26	models/gemini-2.0-flash-lite-001	2.0	Gemini 2.0 Flash-Lite 001	Stable version of Gemini 2.0 Flash Lite	1048576	8192	[generateContent, countTokens, createCachedContent]	1.00	2.0	0.95	40.0
27	models/gemini-2.0-flash-lite	2.0	Gemini 2.0 Flash-Lite	Gemini 2.0 Flash-Lite	1048576	8192	[generateContent, countTokens, createCachedContent]	1.00	2.0	0.95	40.0
28	models/gemini-2.0-flash-lite-preview-02-05	preview-02-05	Gemini 2.0 Flash-Lite Preview 02-05	Preview release (February 5th, 2025) of Gemini 2.0 Flash Lite	1048576	8192	[generateContent, countTokens, createCachedContent]	1.00	2.0	0.95	40.0
29	models/gemini-2.0-flash-lite-preview	preview-02-05	Gemini 2.0 Flash-Lite Preview	Preview release (February 5th, 2025) of Gemini 2.0 Flash Lite	1048576	8192	[generateContent, countTokens, createCachedContent]	1.00	2.0	0.95	40.0
30	models/gemini-2.0-pro-exp	2.5-exp-03-25	Gemini 2.0 Pro Experimental	Experimental release (March 25th, 2025) of Gemini 2.5 Pro	1048576	65536	[generateContent, countTokens, createCachedContent]	1.00	2.0	0.95	64.0
31	models/gemini-2.0-pro-exp-02-05	2.5-exp-03-25	Gemini 2.0 Pro Experimental 02-05	Experimental release (March 25th, 2025) of Gemini 2.5 Pro	1048576	65536	[generateContent, countTokens, createCachedContent]	1.00	2.0	0.95	64.0
32	models/gemini-exp-1206	2.5-exp-03-25	Gemini Experimental 1206	Experimental release (March 25th, 2025) of Gemini 2.5 Pro	1048576	65536	[generateContent, countTokens, createCachedContent]	1.00	2.0	0.95	64.0
33	models/gemini-2.0-flash-thinking-exp-01-21	2.5-preview-04-17	Gemini 2.5 Flash Preview 04-17	Preview release (April 17th, 2025) of Gemini 2.5 Flash	1048576	65536	[generateContent, countTokens, createCachedContent]	1.00	2.0	0.95	64.0
34	models/gemini-2.0-flash-thinking-exp	2.5-preview-04-17	Gemini 2.5 Flash Preview 04-17	Preview release (April 17th, 2025) of Gemini 2.5 Flash	1048576	65536	[generateContent, countTokens, createCachedContent]	1.00	2.0	0.95	64.0
35	models/gemini-2.0-flash-thinking-exp-1219	2.5-preview-04-17	Gemini 2.5 Flash Preview 04-17	Preview release (April 17th, 2025) of Gemini 2.5 Flash	1048576	65536	[generateContent, countTokens, createCachedContent]	1.00	2.0	0.95	64.0
36	models/learnlm-1.5-pro-experimental	001	LearnLM 1.5 Pro Experimental	Alias that points to the most recent stable version of Gemini 1.5 Pro, our mid-size multimodal model that supports up to 2 million tokens.	32767	8192	[generateContent, countTokens]	1.00	2.0	0.95	64.0
37	models/learnlm-2.0-flash-experimental	2.0	LearnLM 2.0 Flash Experimental	LearnLM 2.0 Flash Experimental	1048576	32768	[generateContent, countTokens]	1.00	2.0	0.95	64.0
38	models/gemma-3-1b-it	001	Gemma 3 1B		32768	8192	[generateContent, countTokens]	1.00	NaN	0.95	64.0
39	models/gemma-3-4b-it	001	Gemma 3 4B		32768	8192	[generateContent, countTokens]	1.00	NaN	0.95	64.0
40	models/gemma-3-12b-it	001	Gemma 3 12B		32768	8192	[generateContent, countTokens]	1.00	NaN	0.95	64.0
41	models/gemma-3-27b-it	001	Gemma 3 27B		131072	8192	[generateContent, countTokens]	1.00	NaN	0.95	64.0
42	models/embedding-001	001	Embedding 001	Obtain a distributed representation of a text.	2048	1	[embedContent]	NaN	NaN	NaN	NaN
43	models/text-embedding-004	004	Text Embedding 004	Obtain a distributed representation of a text.	2048	1	[embedContent]	NaN	NaN	NaN	NaN
44	models/gemini-embedding-exp-03-07	exp-03-07	Gemini Embedding Experimental 03-07	Obtain a distributed representation of a text.	8192	1	[embedContent, countTextTokens]	NaN	NaN	NaN	NaN
45	models/gemini-embedding-exp	exp-03-07	Gemini Embedding Experimental	Obtain a distributed representation of a text.	8192	1	[embedContent, countTextTokens]	NaN	NaN	NaN	NaN
46	models/aqa	001	Model that performs Attributed Question Answering.	Model trained to return answers to questions that are grounded in provided sources, along with estimating answerable probability.	7168	1024	[generateAnswer]	0.20	NaN	1.00	40.0
47	models/imagen-3.0-generate-002	002	Imagen 3.0 002 model	Vertex served Imagen 3.0 002 model	480	8192	[predict]	NaN	NaN	NaN	NaN
48	models/gemini-2.0-flash-live-001	001	Gemini 2.0 Flash 001	Gemini 2.0 Flash 001	131072	8192	[bidiGenerateContent, countTokens]	1.00	2.0	0.95	64.0

	Name	City	County	Enrollment[1] Fall 2022	Founded	Athletics
0	University of California, Berkeley	Berkeley	Alameda	45307	1869	NCAA Div. I (ACC, MPSF, America East)
1	University of California, Davis	Davis	Yolo	39679	1905	NCAA Div. I (Big Sky, MPSF, Big West, America East)
2	University of California, Irvine	Irvine	Orange	35937	1965	NCAA Div. I (Big West, MPSF, GCC)
3	University of California, Los Angeles	Los Angeles	Los Angeles	46430	1882*	NCAA Div. I (Big Ten, MPSF)
4	University of California, Merced	Merced	Merced	9103	2005	NAIA (Cal Pac)
5	University of California, Riverside	Riverside	Riverside	26809	1954	NCAA Div. I (Big West)
6	University of California, San Diego	San Diego	San Diego	42006	1960	NCAA Div. I (Big West, MPSF)
7	University of California, Santa Barbara	Santa Barbara	Santa Barbara	26420	1891**	NCAA Div. I (Big West, MPSF, GCC)
8	University of California, Santa Cruz	Santa Cruz	Santa Cruz	19478	1965	NCAA Div. III (C2C, ASC)

	Name	City	County	Enrollment[1] Fall 2022	Founded	Athletics	mascot
0	University of California, Berkeley	Berkeley	Alameda	45307	1869	NCAA Div. I (ACC, MPSF, America East)	Grizzly Bear\n
1	University of California, Davis	Davis	Yolo	39679	1905	NCAA Div. I (Big Sky, MPSF, Big West, America East)	Aggie\n
2	University of California, Irvine	Irvine	Orange	35937	1965	NCAA Div. I (Big West, MPSF, GCC)	Anteater\n
3	University of California, Los Angeles	Los Angeles	Los Angeles	46430	1882*	NCAA Div. I (Big Ten, MPSF)	Bruin\n
4	University of California, Merced	Merced	Merced	9103	2005	NAIA (Cal Pac)	Merced Miner\n
5	University of California, Riverside	Riverside	Riverside	26809	1954	NCAA Div. I (Big West)	Big Red\n
6	University of California, San Diego	San Diego	San Diego	42006	1960	NCAA Div. I (Big West, MPSF)	Triton\n
7	University of California, Santa Barbara	Santa Barbara	Santa Barbara	26420	1891**	NCAA Div. I (Big West, MPSF, GCC)	Gaucho\n
8	University of California, Santa Cruz	Santa Cruz	Santa Cruz	19478	1965	NCAA Div. III (C2C, ASC)	Banana Slug\n

Lecture 27 – Data 100, Spring 2025¶

Getting Setup¶

Step 1. Create an OpenAI account¶

Step 2. Install Python Tools¶

Using OpenAI with LangChain¶

Data Analytics¶

Working with Google Gemini Models¶

Working with images¶

Using Gen AI for EDA¶

More EDA with Open AI¶

	OpenAI	Text
0	False	RT @RepEscobar: Our country has the moral obligation and responsibility to reunite every single family separated at the southern border.\n\nT…
1	True	RT @RoKhanna: What happens when we guarantee $15/hour?\n\n💰 31% of Black workers and 26% of Latinx workers get raises.\n😷 A majority of essent…
2	False	(Source: https://t.co/3o5JEr6zpd)
3	False	Joe Cunningham pledged to never take corporate PAC money, and he never did. Mace said she’ll cash every check she gets. Yet another way this is a downgrade. https://t.co/DytsQXKXgU
4	False	What’s even more gross is that Mace takes corporate PAC money.\n\nShe’s already funded by corporations. Now she’s choosing to swindle working people on top of it.\n\nPeak scam artistry. Caps for cash 💰 https://t.co/CcVxgDF6id
5	False	Joe Cunningham already proving to be leagues more decent + honest than Mace seems capable of.\n\nThe House was far better off w/ Cunningham. It’s sad to see Mace diminish the representation of her community by launching a reputation of craven dishonesty right off the bat.
6	False	Pretty horrible.\n\nWell, it’s good to know what kind of person she is early. Also good to know that Mace is cut from the same Trump cloth of dishonesty and opportunism.\n\nSad to see a colleague intentionally hurt other women and survivors to make a buck. Thought she’d be better. https://t.co/CcVxgDF6id
7	False	RT @jaketapper: .@RepNancyMace fundraising off the false smear that @AOC misrepresented her experience during the insurrection. She didn’t.…
8	False	RT @RepMcGovern: One reason Washington can’t “come together” is because of people like her sending out emails like this.\n\nShe should apolog…
9	True	RT @JoeNeguse: Just to be clear, “targeting” stimulus checks means denying them to some working families who would otherwise receive them.
10	True	Amazon workers have the right to form a union.\n\nAnti-union tactics like these, especially from a trillion-dollar company trying to disrupt essential workers from organizing for better wages and dignified working conditions in a pandemic, are wrong. https://t.co/nTDqMUapYs
11	False	RT @WorkingFamilies: Voters elected Democrats to deliver more relief, not less.
12	False	We should preserve what was there and not peg it to outdated 2019 income. People need help!
13	True	If conservative Senate Dems institute a lower income threshold in the next round of checks, that could potentially mean the first round of checks under Trump help more people than the first round under Biden.\n\nDo we want to do that? No? Then let’s stop playing & just help people.
14	False	@iamjoshfitz 😂 call your member of Congress, they can help track it down
15	True	All Dems need for the slam dunk is to do what people elected us to do: help as many people as possible.\n\nIt’s not hard. Let’s not screw it up with austerity nonsense that squeezes the working class yet never makes a peep when tax cuts for yachts and private jets are proposed.
16	False	It should be $2000 to begin w/ anyway. Brutally means-testing a $1400 round is going to hurt so many people. THAT is the risk we can’t afford.\n\nIncome thresholds already work in reverse & lag behind reality. Conservative Dems can ask to tax $ back later if they’re so concerned.
17	False	We cannot cut off relief at $50k. It is shockingly out of touch to assert that $50k is “too wealthy” to receive relief.\n\nMillions are on the brink of eviction. Give too little and they’re devastated. Give “too much” and a single mom might save for a rainy day. This isn’t hard. https://t.co/o14r3phJeH
18	True	Imagine being a policymaker in Washington, having witnessed the massive economic, social, and health destruction over the last year, and think that the greatest policy risk we face is providing too much relief.\n\nSounds silly, right?\n\n$1.9T should be a floor, not a ceiling.
19	False	@AndrewYang @TweetBenMax @RitchieTorres Thanks @AndrewYang! Happy to chat about the plan details and the community effort that’s gone into this legislation. 🌃🌎

	StudentID	Feedback	Rating
0	1	Great class, learned a lot! But I really did not like PCA.	5
1	2	The course was very informative and well-structured. Would prefer if lectures went faster.	4
2	3	I found the assignments challenging but rewarding. But the midterm was brutal.	4
3	4	The lectures were engaging and the instructor was very knowledgeable.	5
4	5	I struggled with the linear algebra. I would recommend this class to anyone interested in data science.	5

	StudentID	Feedback	Rating	Issue	Liked
0	1	Great class, learned a lot! But I really did not like PCA.	5	I really did not like PCA.	Great class, learned a lot!
1	2	The course was very informative and well-structured. Would prefer if lectures went faster.	4	Would prefer if lectures went faster.	The course was very informative and well-structured.
2	3	I found the assignments challenging but rewarding. But the midterm was brutal.	4	The midterm was brutal.	I found the assignments challenging but rewarding.
3	4	The lectures were engaging and the instructor was very knowledgeable.	5	None	The lectures were engaging and the instructor was very knowledgeable.
4	5	I struggled with the linear algebra. I would recommend this class to anyone interested in data science.	5	I struggled with the linear algebra.	I would recommend this class to anyone interested in data science.