Create Cloud SQL instance
Create database tables by importing .sql files from Cloud Storage
Populate the tables by importing .csv files from Cloud Storage
Allow access to Cloud SQL
Explore the rentals data using SQL statements from CloudShell
#================================================
CREATE DATABASE IF NOT EXISTS recommendation_spark;
USE recommendation_spark;
DROP TABLE IF EXISTS Recommendation;
DROP TABLE IF EXISTS Rating;
DROP TABLE IF EXISTS Accommodation;
CREATE TABLE IF NOT EXISTS Accommodation
(
id varchar(255),
title varchar(255),
location varchar(255),
price int,
rooms int,
rating float,
type varchar(255),
PRIMARY KEY (ID)
);
CREATE TABLE IF NOT EXISTS Rating
(
userId varchar(255),
accoId varchar(255),
rating int,
PRIMARY KEY(accoId, userId),
FOREIGN KEY (accoId)
REFERENCES Accommodation(id)
);
CREATE TABLE IF NOT EXISTS Recommendation
(
userId varchar(255),
accoId varchar(255),
prediction float,
PRIMARY KEY(userId, accoId),
FOREIGN KEY (accoId)
REFERENCES Accommodation(id)
);
SHOW DATABASES;
root pw: r*****s0987
--code to connect to database in SQL:
gcloud sql connect rentals --user=root --quiet
--code to create buckets and copy data/files:
echo "Creating bucket: gs://$DEVSHELL_PROJECT_ID"
gsutil mb gs://$DEVSHELL_PROJECT_ID
echo "Copying data to our storage from public dataset"
gsutil cp gs://cloud-training/bdml/v2.0/data/accommodation.csv gs://$DEVSHELL_PROJECT_ID
gsutil cp gs://cloud-training/bdml/v2.0/data/rating.csv gs://$DEVSHELL_PROJECT_ID
echo "Show the files in our bucket"
gsutil ls gs://$DEVSHELL_PROJECT_ID
echo "View some sample data"
gsutil cat gs://$DEVSHELL_PROJECT_ID/accommodation.csv
#===========================================================
Lab-1: use Dataproc to train the recommendations machine learning model based on users' previous ratings. You then apply that model to create a list of recommendations for every user in the database. In this lab, you will:
Launch Dataproc
Train and apply ML model written in PySpark to create product recommendations
Explore inserted rows in Cloud SQL
#================================================
echo "Authorizing Cloud Dataproc to connect with Cloud SQL"
CLUSTER=rentals
CLOUDSQL=rentals
ZONE=us-central1-a
NWORKERS=2
machines="$CLUSTER-m"
for w in `seq 0 $(($NWORKERS - 1))`; do
machines="$machines $CLUSTER-w-$w"
done
echo "Machines to authorize: $machines in $ZONE ... finding their IP addresses"
ips=""
for machine in $machines; do
IP_ADDRESS=$(gcloud compute instances describe $machine --zone=$ZONE --format='value(networkInterfaces.accessConfigs[].natIP)' | sed "s/\[u'//g" | sed "s/'\]//g" )/32
echo "IP address of $machine is $IP_ADDRESS"
if [ -z $ips ]; then
ips=$IP_ADDRESS
else
ips="$ips,$IP_ADDRESS"
fi
done
echo "Authorizing [$ips] to access cloudsql=$CLOUDSQL"
gcloud sql instances patch $CLOUDSQL --authorized-networks $ips
#===========================================================
Your data science team has created a recommendation model using Apache Spark and written in Python. Let's copy it over into our staging bucket.
#================================================
gsutil cp gs://cloud-training/bdml/v2.0/model/train_and_apply.py train_and_apply.py
cloudshell edit train_and_apply.py
#!/usr/bin/env python
import os
import sys
import pickle
import itertools
from math import sqrt
from operator import add
from os.path import join, isfile, dirname
from pyspark import SparkContext, SparkConf, SQLContext
from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating
from pyspark.sql.types import StructType, StructField, StringType, FloatType
# MAKE EDITS HERE
CLOUDSQL_INSTANCE_IP = '34.68.30.28' # <---- -x="" ...="" 100="" 20="" 5="" accoid="" accommodations="" allpredictions.extend="" allpredictions="predictions" and="" are="" as-is="" below="" but="" change="" checkpoint="" checkpointing="" choices="" cloud="" cloudsql_db_name="" cloudsql_pwd="" cloudsql_user="" collect="" conf="SparkConf().setAppName(" could="" data="" database="" dbtable="Accommodation" dfaccos="sqlContext.read.format(" dfrates="sqlContext.read.format(" dftosave.write.jdbc="" dftosave="sqlContext.createDataFrame(allPredictions," dfuserratings="" do="" driver="jdbcDriver," edits="" else:="" errors="" float="" floattype="" font="" for="" format="" from="" has="" helps="" if="" in="" ip="" jdbc="" jdbcdriver="com.mysql.jdbc.Driver" jdbcurl="jdbc:mysql://%s:3306/%s?user=%s&password=%s" key="lambda" lambda="" leave="" load="" make="" mode="overwrite" model="" none="" not="" numbers="" options="" overflow="" p:="" p="" pairspotential="rddPotential.map(lambda" predict="" predicted="" prediction="" predictions="" prevent="" print="" r.accoid="" r:="" range="" rate="" rated="" ratings="" rdd.map="" rddpotential="dfAccos.rdd.filter(lambda" read="" reasonable="" sc.setcheckpointdir="" sc="SparkContext(conf=conf)" schema="" server="" she="" sql="" sqlcontext="SQLContext(sc)" stack="" str="" stringtype="" structfield="" table="Recommendation" that="" the="" them="" these="" this="" to="" top="" train="" train_model="" trained="" true="" tune="" url="jdbcUrl," use="" user="{0}" user_id="" userid="" usessl="false" what="" would="" write="" x:="" x="" you="">---->
Lab 2: Predict Visitor Purchases with a Classification Model with BigQuery MLIn this lab, you learn to perform the following tasks:
Use BigQuery to find public datasets
Query and explore the ecommerce dataset
Create a training and evaluation dataset to be used for batch prediction
Create a classification (logistic regression) model in BQML
Evaluate the performance of your machine learning model
Predict and rank the probability that a visitor will make a purchase
#================================================
#1. train the dataset to generate a model:
CREATE OR REPLACE MODEL `ecommerce.classification_model`
OPTIONS
(
model_type='logistic_reg',
labels = ['will_buy_on_return_visit']
)
AS
#standardSQL
SELECT
* EXCEPT(fullVisitorId)
FROM
# features
(SELECT
fullVisitorId,
IFNULL(totals.bounces, 0) AS bounces,
IFNULL(totals.timeOnSite, 0) AS time_on_site
FROM
`data-to-insights.ecommerce.web_analytics`
WHERE
totals.newVisits = 1
AND date BETWEEN '20160801' AND '20170430') # train on first 9 months
JOIN
(SELECT
fullvisitorid,
IF(COUNTIF(totals.transactions > 0 AND totals.newVisits IS NULL) > 0, 1, 0) AS will_buy_on_return_visit
FROM
`data-to-insights.ecommerce.web_analytics`
GROUP BY fullvisitorid)
USING (fullVisitorId)
;
#2. evaluate the model performance: roc etc
SELECT
roc_auc,
CASE
WHEN roc_auc > .9 THEN 'good'
WHEN roc_auc > .8 THEN 'fair'
WHEN roc_auc > .7 THEN 'not great'
ELSE 'poor' END AS model_quality
FROM
ML.EVALUATE(MODEL ecommerce.classification_model, (
#for the case when statement, it will run based on the order,
#if the first one satisfied, it will not check the other, exclusive!
#unless it's in the joins table, then all might show up!
SELECT
* EXCEPT(fullVisitorId)
FROM
# features
(SELECT
fullVisitorId,
IFNULL(totals.bounces, 0) AS bounces,
IFNULL(totals.timeOnSite, 0) AS time_on_site
FROM
`data-to-insights.ecommerce.web_analytics`
WHERE
totals.newVisits = 1
AND date BETWEEN '20170501' AND '20170630') # eval on 2 months
JOIN
(SELECT
fullvisitorid,
IF(COUNTIF(totals.transactions > 0 AND totals.newVisits IS NULL) > 0, 1, 0) AS will_buy_on_return_visit
FROM
`data-to-insights.ecommerce.web_analytics`
GROUP BY fullvisitorid)
USING (fullVisitorId)
)); #3. predict the new visitor to see if he/she will buy on the later visit:
SELECT
*
FROM
ml.PREDICT(MODEL `ecommerce.classification_model_2`,
(
WITH all_visitor_stats AS (
SELECT
fullvisitorid,
IF(COUNTIF(totals.transactions > 0 AND totals.newVisits IS NULL) > 0, 1, 0) AS will_buy_on_return_visit
FROM `data-to-insights.ecommerce.web_analytics`
GROUP BY fullvisitorid
)
SELECT
CONCAT(fullvisitorid, '-',CAST(visitId AS STRING)) AS unique_session_id,
# labels
will_buy_on_return_visit,
MAX(CAST(h.eCommerceAction.action_type AS INT64)) AS latest_ecommerce_progress,
# behavior on the site
IFNULL(totals.bounces, 0) AS bounces,
IFNULL(totals.timeOnSite, 0) AS time_on_site,
totals.pageviews,
# where the visitor came from
trafficSource.source,
trafficSource.medium,
channelGrouping,
# mobile or desktop
device.deviceCategory,
# geographic
IFNULL(geoNetwork.country, "") AS country
FROM `data-to-insights.ecommerce.web_analytics`,
UNNEST(hits) AS h
JOIN all_visitor_stats USING(fullvisitorid)
WHERE
# only predict for new visits
totals.newVisits = 1
AND date BETWEEN '20170701' AND '20170801' # test 1 month
GROUP BY
unique_session_id,
will_buy_on_return_visit,
bounces,
time_on_site,
totals.pageviews,
trafficSource.source,
trafficSource.medium,
channelGrouping,
device.deviceCategory,
country
)
)
ORDER BY
predicted_will_buy_on_return_visit DESC;
#===========================================================
Create a dataset in GCP BigQuery via Console UI:
Crate dataset(database), then in each dataset, you have a separate table/data.
#================================================
ride_id:string,
point_idx:integer,
latitude:float,
longitude:float,
timestamp:timestamp,
meter_reading:float,
meter_increment:float,
ride_status:string,
passenger_count:integer
#===========================================================
Classify Images with Pre-built ML Models using Cloud Vision API and AutoML
In this lab you will upload images to Cloud Storage and use them to train a custom model to recognize different types of clouds (cumulus, cumulonimbus, etc.)
1. Upload a labeled dataset to Google Cloud Storage and connect it to AutoML Vision with a CSV label file.
2. Train a model with AutoML Vision and evaluate its accuracy.
3. Generate predictions on your trained model.
First Step: after enable Cloud AutoML Vision API, setup the project ID from the Cloud Shell:
export PROJECT_ID=qwiklabs-gcp-02-1258a4b7a7ca
export QWIKLABS_USERNAME=student-02-ca7927dfff9d@qwiklabs.net
Now, create a Storage Bucket for the images you will use in testing. Create one by running the following command:
gsutil mb -p $PROJECT_ID \
-c regional \
-l us-central1 \
gs://$PROJECT_ID-vcm/
Before you add the cloud images, create an environment variable with the name of your bucket by running the following command in Cloud Shell, replacing YOUR_BUCKET_NAME in the command below with the name of your bucket:
export BUCKET=$PROJECT_ID-vcm
Notice you might need "/" at the end sometimes, you can double check if you have assign the value correctly on the cloud shell command line: cat $BUCKET
The training images are publicly available in a Cloud Storage bucket. Use the gsutil command line utility for Cloud Storage to copy the training images into your bucket:
gsutil -m cp -r gs://automl-codelab-clouds/* gs://${BUCKET}/
When the images finish copying, click the Refresh button at the top of the Cloud Storage browser. Then click on your bucket name. You should see 3 folders of photos for each of the 3 different cloud types to be classified.
Now that your training data is in Cloud Storage, you need a way for AutoML Vision to access it. You'll create a CSV file where each row contains a URL to a training image and the associated label for that image. This CSV file has been created for you; you just need to update it with your bucket name.
Run the following command to copy the file to your Cloud Shell instance:
gsutil cp gs://automl-codelab-metadata/data.csv .
Then update the CSV with the files in your project:
sed -i -e "s/placeholder/${BUCKET}/g" ./data.csv
--this is to replace the string "placeholder" by the bucket name
Replace text at line 3
sed -e "3s/oldtext/newtext/" infile > outfile
Replace all occurances
sed -e "s/oldtext/newtext/g" infile > outfile
Change aaa to bbb
outfile = 'echo $infile/sed -e "ls/aaa/bbb"'
Print line i
sed -n "$i"p infile
Now you're ready to upload this file to your Cloud Storage bucket:
gsutil cp ./data.csv gs://${BUCKET}
Step2: Create the training dataset on the AutoML Vision API:
At the top of the console, click + NEW DATASET.
Type "clouds" for the Dataset name.
Leave "Single-label Classification" checked.
In the csv training dataset, it should have some values like:
[set,]image_path[,label]
TRAIN,gs://My_Bucket/sample1.jpg,cat
TEST,gs://My_Bucket/sample2.jpg,dog
After finished the training, we can apply the prediction by click "Test and USe" or run this in the Cloud shell:
python predict.py YOUR_LOCAL_IMAGE_FILE 833728172698 ICN1336966556957016064
or via Python:
#===========================================================import sys
from google.cloud import automl_v1beta1
from google.cloud.automl_v1beta1.proto import service_pb2
# 'content' is base-64-encoded image data.
def get_prediction(content, project_id, model_id):
prediction_client = automl_v1beta1.PredictionServiceClient()
name = 'projects/{}/locations/us-central1/models/{}'.format(project_id, model_id)
payload = {'image': {'image_bytes': content }}
params = {}
request = prediction_client.predict(name, payload, params)
return request # waits till request is returned
if __name__ == '__main__':
file_path = sys.argv[1]
project_id = sys.argv[2]
model_id = sys.argv[3]
with open(file_path, 'rb') as ff:
content = ff.read()
print get_prediction(content, project_id, model_id)
#===========================================================
In summary, there are 3 ways to do image classification on GCP:
1. Use Bigquery to run the image classification training and prediction in SQL.
2. USe AutoML Vision API to load the image data, csv file with URL(in GCP storage), to explore/train/evaluate/predict.
3. Use Deep learning VM /AI platform/ ML Engine Jupyter Notebook (can write own model with Keras) to write python code to explore/train/evaluate/predict.
No comments:
Post a Comment