Thursday, December 26, 2019

Python study notes 5: Github from beginning to advanced

What's SSH/Key? How do we generate SSH key for github?
What's the preparation work before git?
What's the laymen steps to start git work

What's SSH/Key? How do we use SSH in github?
What's the preparation work before git?

Secure Shell (SSH) is a cryptographic network protocol for operating network services securely over an unsecured network in a client-server architecture. With SSH keys, you can connect to GitHub Enterprise without supplying your username or password at each visit.

The SSH key is a pair: public/private rsa key pair. Why it's called RSA key?
RSA was named after the initials of its co-founders, Ron Rivest, Adi Shamir and Leonard Adleman.

There is relatively easily way to update/sync up the local file to the remote git repository, by using the github desktop.
Instead of using the following seriously git ... coding, we can simply use github desktop to update/download/upload files between two different places, after you made changes in the github desktop folder files, then you can simply click "push origin" to update to the remote github.

Difference between the commands “gcloud compute ssh” and “ssh

gcloud compute ssh installs the key pair as ~/.ssh/google_compute_engine[.pub]. 

ssh uses ~/.ssh/identity[.pub] by default.

You can either copy the google key pair to the default name:

$ cp ~/.ssh/google_compute_engine ~/.ssh/identity
$ cp ~/.ssh/ ~/.ssh/
or you can just specify the private key to use when you invoke ssh:

$ ssh -i ~/.ssh/google_compute_engine instance_name
By default, a new Google Compute Engine (GCE) VM instance does not have SSH keys pre-assigned to it, so you cannot "retrieve" them as they don't exist—it's up to you to create them, or use a tool like gcloud (see below) which will prompt you to create them if you don't have SSH keys yet.

You have several options for connecting to your newly-created GCE VM.

One option is to connect using the "SSH" button in the Developer Console GUI next to the instance in the list of instances, which will open a browser window and a terminal session to the instance.

If you would like to connect via SSH client on the command-line, you can use gcloud tool (part of the Google Cloud SDK):

gcloud compute ssh example-instance
You can see the full set of flags and options on the gcloud compute ssh help page, along with several examples.

If you don't already have SSH keys, it will prompt you to create them and then connect to the instance. If you already have keys, you can use existing SSH keys, which it will transfer to the instance.

By default, gcloud expects keys to be located at the following paths:

$HOME/.ssh/google_compute_engine – private key
$HOME/.ssh/ – public key
If you want to reuse keys from a different location with gcloud, consider either making symlinks or pointing gcloud there using the --ssh-key-file flag.

Before we can actually coding "git" stuff between the remote github repository and our local files, we need to prepare the following 3 steps first.

Step 1: Generating a new SSH key:
1. Open Git Bash terminal or Open terminal from jupyter notebook (in Sagemaker if AWS).
2. Paste the text below, substituting in your GitHub Enterprise email address.ssh-keygen -t rsa -b 4096 -C ""
This creates a new ssh key, using the provided email as a label.

When you're prompted to "Enter a file in which to save the key," press Enter. This accepts the default file location. Double check to make sure the directory is .ssh/is_rsa.

Enter a file in which to save the key (/Users/you/.ssh/id_rsa): [Press enter]

or Enter a file in which to save the key (/c/Users/you/.ssh/id_rsa):[Press enter]

or Enter a file in which to save the key (/home/you/.ssh/id_rsa): [Press enter]

At the prompt, type a secure passphrase.

Enter passphrase (empty for no passphrase): [Type a passphrase]

Enter same passphrase again: [Type passphrase again]

Step 2: Adding your SSH key to the ssh-agent: the other side of github push.

1. Start another terminal (from jupyter notebook in Sagemake from AWS)

# start the ssh-agent in the background
eval $(ssh-agent -s)
Agent pid 59566

2. Add your SSH private key to the ssh-agent. If you created your key with a different name,
or if you are adding an existing key that has a different name, replace id_rsa in the command with the name of your private key file.

ssh-add ~/.ssh/id_rsa

Step 3: Adding a new SSH key to your GitHub account

1. Copy the SSH key to your clipboard.
If your SSH key file has a different name than the example code, modify the filename to match your current setup.

cat ~/.ssh/

You can also list all the public key available/saved by: ls ~/.ssh/*.pub
When copying your key, don't add any newlines or whitespace. You need to right click to copy everything starting from "ssh-rsa ******" to the end of "".

2. Add SSH key to your GitHub account:
Settings icon in the user barIn the upper-right corner of any page, click your profile photo, then click Settings.
Authentication keysIn the user settings sidebar, ==>click SSH and GPG keys. SSH Key ==> New SSH key or Add SSH key.

In the "Title" field, add a descriptive label for the new key. For example, if you're using a personal Mac, you might call this key "Personal MacBook Air".

The key fieldPaste your key into the "Key" field.
The Add key buttonClick Add SSH key.

Sudo mode dialogIf prompted, confirm your GitHub Enterprise password.

Here is the step most of us need to do freqently:
Now we have generated the SSH key, add the keys to both the local and remot github repository. It's time to start git "stuff".

1. Open Git Bash terminal or Open terminal from jupyter notebook (in Sagemaker if AWS).

2. load ssh-agent program: eval $(ssh-agent -s)

ssh-agent is program that runs in the background and keeps your key loaded into memory, so that you don't need to enter your passphrase every time you need to use the key. The nifty thing is, you can choose to let servers access your local ssh-agent as if they were already running on the server. This is sort of like asking a friend to enter their password so that you can use their computer.

3. add the ssh key: ssh-add ~/SageMaker/.ssh/id_rsa

4a. print out the current working directory: pwd

4b. to remove a folder, you can use: rm -r folder_name or rmdir folder_name

4c. change the directory to the one that you want to pull to: cd /home/ec2-user/SageMaker/test1

4d. ls .git -la or git remote -v to see any existing git. If none exists, you can use:
git remote add origin
4e. after checking the git status, switch to different branch: git checkout -b branch_name

5. check what directory/file inside: ls

6. make a direcotory/folder: mkdir name_folder

7. change directory to that folder: cd /home/ec2-user/SageMaker/test1

8a. you can alwasy run: git status to check if there are any existing git for that folder vs the remote git repo.

8b. If you just want to update/change one local file by the file from the master branch on the github, here is the code to run,
First you need to run git fetch via:

git fetch --all #download all the master files

Then run git checkout via:
git checkout origin/master -- container/serve/Dockerfile

git checkout origin/master --

git checkout origin/master --

git checkout origin/master --

Don't forget that -- in the middle.

Use this one to update all the local files:
git merge origin/master ## for force to overrite local files

Notice sometimes you might get some error message like: "Already up-to-date",
in that case, you want to run git pull to update/overrite all local files in the working folder.
git fetch is the command that says "bring my local copy of the remote repository up to date."

git pull says "bring the changes in the remote repository to where I keep my own code."

When you use pull, Git tries to automatically do your work for you. It is context sensitive, so Git will merge any pulled commits into the branch you are currently working in. pull automatically merges the commits without letting you review them first. If you don't closely manage your branches, you may run into frequent conflicts.

When you fetch, Git gathers any commits from the target branch that do not exist in your current branch and stores them in your local repository. However, it does not merge them with your current branch. This is particularly useful if you need to keep your repository up to date, but are working on something that might break if you update your files. To integrate the commits into your master branch, you use merge.

8c. If you just want to download all the code: git clone git@ghe.*****.com: ked-dataScience/uvi-delivery.git

If we want to create a branch repository at the remote github, make some changes on the local repo, then saved back to the remote github repository branch,
we can do the following:

1. Check all the availalbe branches: git branch -a

2. switch to the one that you want to connect: git checkout -b branch_name

3. Initiate the connections to the remote: git init .

4. After you made some changes to the local repository, push to the remote github, usually push to your sub branch first, then pull request on github to merge with the master branch:

4a. git status to see what file has changed.

4b. git add to add that file local.

4c. git commit -m "what is the change" to commit locally.

4d. git push origin branch_name -f
to the remote github.
use -f to force the udpate to the branch, otherwise you might get some error message like:
"failed to push some refs to 'git@ghe.*****.com:directory/folder1.git'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes "

After you finish everything in the terminal, you can use: exit to finish and exit.

If you want to delete a few files from a branch in the remote github, you can do the following:

1. git branch -b branch1 or git checkout master

2. git status #to double check

3. git rm Untitled.ipynb Untitled.csv

4. git commit -m 'remove a few files'

5. git push

Some other git examples:

checks out the master branch, reverts the Makefile to two revisions back,
deletes hello.c by mistake, and gets it back from the index.
$ git checkout master   #don't use: git checkout -b master          (1)
$ git checkout master~2 Makefile  (2)
$ rm -f hello.c
$ git checkout hello.c            (3)

If you want to check out all C source files out of the index:
$ git checkout -- '*.c'   #to update all .c files.  
$ git checkout -- hello.c #to update the file: hello.c

In case if your previous git commit crushed, and when you tried to reconnect and git add/commit, you might get the following error message:

"fatal: Unable to create '/home/ec2-user/folder_name/.git/index.lock': File exists. Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue."

Then you can use the following to remove the lock file: rm -f .git/index.lock

When we get the erro message: "git remote add origin
fatal: remote origin already exists. "
you use use ls .git to see the current git status.
We can delete the remote git via:

git remote -v
# View current remotes
origin (fetch)
origin (push)
destination (fetch)
destination (push)
git remote rm destination
git remote rm origin
git remote rm destination
# Remove remote
git remote -v
# Verify it's gone
origin (fetch)
origin (push)

How to create a tar archive with faked information:

import tarfile
tar ="sample.tar.gz", "w:gz")
for name in namelist:
    tarinfo = tar.gettarinfo(name, "fakeproj-1.0/" + name)
    tarinfo.uid = 123
    tarinfo.gid = 456
    tarinfo.uname = "johndoe"
    tarinfo.gname = "fake"
    tar.addfile(tarinfo, file(name))

How to unzip a compressed file from S3 bucket.

import s3fs
import tarfile
fs = s3fs.S3FileSystem()
model_file_s3 = 's3://directory/output/model.tar.gz', 'rb')    
for tarinfo in tar:
    print(, "is", tarinfo.size, "bytes in size and is")
    if tarinfo.isreg():
        print("a regular file.")
    elif tarinfo.isdir():
        print("a directory.")
        print("something else.")

for tarinfo in tar:
    if tarinfo.isreg():
        print("something else.")
print('num of files',count)
print('size of the all files',size) 

error: src refspec remotes/origin/branch1 matches more than one
warning: refname 'remotes/origin/branch1' is ambiguous.
fatal: Ambiguous object name: 'remotes/origin/branch1'
fatal: The current branch remotes/origin/branch1 has no upstream branch

Solution: run the following code to solve the ambiguous issue.
git push -f origin HEAD:branch1

How do we create a folder in Github from browser by point-click
Answer: Sometime it's just more convenient if we can create a folder in github via browser, this is a little bit tricky.

1. Click: create a file button.
2. In the blank box, input the folder name you want, for example, you will see something like: Github/Team/project, then type "code/", don't forget the back slash at the end, then click the enter, you will create a folder, instead of creating a file.

Have you ever faced the situation where you perform a long-running task on a remote machine, and suddenly your connection drops, the SSH session is terminated, and your work is lost. Well, it has happened to all of us at some point, hasn’t it? Luckily, there is a utility called screen that allows us to resume the sessions.

Screen or GNU Screen is a terminal multiplexer. In other words, it means that you can start a screen session and then open any number of windows (virtual terminals) inside that session. Processes running in Screen will continue to run when their window is not visible even if you get disconnected.

Install Linux GNU Screen
The screen package is pre-installed on most Linux distros nowadays. You can check if it is installed on your system by typing:

screen --version

Install Linux Screen on Ubuntu and Debian

sudo apt update
sudo apt install screen

Basic Linux Screen Usage
Below are the most basic steps for getting started with screen:
1. On the command prompt, type screen.
2. Run the desired program.
3. Use the key sequence Ctrl-a + Ctrl-d to detach from the screen session.
4. Reattach to the screen session by typing screen -r. Notice you might have to go to different tab to restore.

To start a screen session, simply type screen in your console:

You can detach from the screen session at any time by typing:
Ctrl+a d

To find the session ID list the current running screen sessions with:
screen -ls

If you want to restore screen 10835.pts-0, then type the following command:
screen -r 10835

The above tutorial is from linuxize.

No comments:

Post a Comment

Data Science Study Notes: reinforcement learning

Terminology: State vs Action vs Policy vs Reward vs State Transition. Policy function is probabality density function(PDF), policy network:...