What's the preparation work before git?
What's the laymen steps to start git work
What's SSH/Key? How do we use SSH in github? What's the preparation work before git?
Secure Shell (SSH) is a cryptographic network protocol for operating network services securely over an unsecured network in a client-server architecture. With SSH keys, you can connect to GitHub Enterprise without supplying your username or password at each visit.
The SSH key is a pair: public/private rsa key pair. Why it's called RSA key?
RSA was named after the initials of its co-founders, Ron Rivest, Adi Shamir and Leonard Adleman.
Before we can actually "git" stuff between the remote github repository and our local files, we need to prepare the following 3 steps first.
Step 1: Generating a new SSH key:
1. Open Git Bash terminal or Open terminal from jupyter notebook (in Sagemaker if AWS).
2. Paste the text below, substituting in your GitHub Enterprise email address.ssh-keygen -t rsa -b 4096 -C "email@example.com"
This creates a new ssh key, using the provided email as a label.
When you're prompted to "Enter a file in which to save the key," press Enter. This accepts the default file location. Double check to make sure the directory is .ssh/is_rsa.
Enter a file in which to save the key (/Users/you/.ssh/id_rsa): [Press enter]
or Enter a file in which to save the key (/c/Users/you/.ssh/id_rsa):[Press enter]
or Enter a file in which to save the key (/home/you/.ssh/id_rsa): [Press enter]
At the prompt, type a secure passphrase.
Enter passphrase (empty for no passphrase): [Type a passphrase]
Enter same passphrase again: [Type passphrase again]
Step 2: Adding your SSH key to the ssh-agent: the other side of github push.
1. Start another terminal (from jupyter notebook in Sagemake from AWS)
# start the ssh-agent in the background eval $(ssh-agent -s) Agent pid 59566
2. Add your SSH private key to the ssh-agent. If you created your key with a different name,
or if you are adding an existing key that has a different name, replace id_rsa in the command with the name of your private key file.
Step 3: Adding a new SSH key to your GitHub account
1. Copy the SSH key to your clipboard.
If your SSH key file has a different name than the example code, modify the filename to match your current setup.
You can also list all the public key available/saved by: ls ~/.ssh/*.pub
When copying your key, don't add any newlines or whitespace. You need to right click to copy everything starting from "ssh-rsa ******" to the end of " firstname.lastname@example.org".
2. Add SSH key to your GitHub account:
Settings icon in the user barIn the upper-right corner of any page, click your profile photo, then click Settings.
Authentication keysIn the user settings sidebar, ==>click SSH and GPG keys. SSH Key ==> New SSH key or Add SSH key.
In the "Title" field, add a descriptive label for the new key. For example, if you're using a personal Mac, you might call this key "Personal MacBook Air".
The key fieldPaste your key into the "Key" field.
The Add key buttonClick Add SSH key.
Sudo mode dialogIf prompted, confirm your GitHub Enterprise password.
Here is the step most of us need to do freqently:
Now we have generated the SSH key, add the keys to both the local and remot github repository. It's time to start git "stuff".
1. Open Git Bash terminal or Open terminal from jupyter notebook (in Sagemaker if AWS).
2. load ssh-agent program: eval $(ssh-agent -s)
ssh-agent is program that runs in the background and keeps your key loaded into memory, so that you don't need to enter your passphrase every time you need to use the key. The nifty thing is, you can choose to let servers access your local ssh-agent as if they were already running on the server. This is sort of like asking a friend to enter their password so that you can use their computer.
3. add the ssh key: ssh-add ~/SageMaker/.ssh/id_rsa
4a. print out the current working directory: pwd
4b. to remove a folder, you can use: rm -r folder_name or rmdir folder_name
4c. change the directory to the one that you want to pull to: cd /home/ec2-user/SageMaker/test1
4d. ls .git -la or git remote -v to see any existing git. If none exists, you can use:
git remote add origin https://ghe.coxautoinc.com/MediaGroup-DecisionScience/uv-deliver/uv-delivery.git
4e. after checking the git status, switch to different branch: git checkout -b branch_name
5. check what directory/file inside: ls
6. make a direcotory/folder: mkdir name_folder
7. change directory to that folder: cd /home/ec2-user/SageMaker/test1
8a. you can alwasy run: git status to check if there are any existing git for that folder vs the remote git repo.
8b. If you just want to update/change one local file by the file from the master branch on the github, here is the code to run,
First you need to run git fetch via:
git fetch --all #download all the master files
Then run git checkout via:
git checkout origin/master -- container/serve/Dockerfile
git checkout origin/master -- model_dataprep.py
git checkout origin/master -- model_scoring.py
git checkout origin/master -- model_selection.py
Don't forget that -- in the middle.
Use this one to update all the local files:
git merge origin/master ## for force to overrite local files
Notice sometimes you might get some error message like: "Already up-to-date",
in that case, you want to run git pull to update/overrite all local files in the working folder.
git fetch is the command that says "bring my local copy of the remote repository up to date."
git pull says "bring the changes in the remote repository to where I keep my own code."
When you use pull, Git tries to automatically do your work for you. It is context sensitive, so Git will merge any pulled commits into the branch you are currently working in. pull automatically merges the commits without letting you review them first. If you don't closely manage your branches, you may run into frequent conflicts.
When you fetch, Git gathers any commits from the target branch that do not exist in your current branch and stores them in your local repository. However, it does not merge them with your current branch. This is particularly useful if you need to keep your repository up to date, but are working on something that might break if you update your files. To integrate the commits into your master branch, you use merge.
8c. If you just want to download all the code: git clone git@ghe.*****.com:MediaGroup-DecisionScience/uv-delivery.git
If we want to create a branch repository at the remote github, make some changes on the local repo, then saved back to the remote github repository branch,
we can do the following:
1. Check all the availalbe branches: git branch -a
2. switch to the one that you want to connect: git checkout -b branch_name
3. Initiate the connections to the remote: git init .
4. After you made some changes to the local repository, push to the remote github, usually push to your sub branch first, then pull request on github to merge with the master branch:
4a. git status to see what file has changed.
4b. git add file1.py file2.py to add that file local.
4c. git commit -m "what is the change" to commit locally.
4d. git push origin branch_name -f
to the remote github.
use -f to force the udpate to the branch, otherwise you might get some error message like:
"failed to push some refs to 'git@ghe.*****.com:directory/folder1.git'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes "
After you finish everything in the terminal, you can use: exit to finish and exit.
If you want to delete a few files from a branch in the remote github, you can do the following:
1. git branch -b branch1 or git checkout master
2. git status #to double check
3. git rm Untitled.ipynb Untitled.csv
4. git commit -m 'remove a few files'
5. git push
Some other git examples:
checks out the master branch, reverts the Makefile to two revisions back,
deletes hello.c by mistake, and gets it back from the index. #================================================ $ git checkout master #don't use: git checkout -b master (1) $ git checkout master~2 Makefile (2) $ rm -f hello.c $ git checkout hello.c (3) #================================================
If you want to check out all C source files out of the index: $ git checkout -- '*.c' #to update all .c files. $ git checkout -- hello.c #to update the file: hello.c #================================================
In case if your previous git commit crushed, and when you tried to reconnect and git add/commit, you might get the following error message:
"fatal: Unable to create '/home/ec2-user/folder_name/.git/index.lock': File exists. Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue."
Then you can use the following to remove the lock file: rm -f .git/index.lock
When we get the erro message: "git remote add origin https://github.com/XXXX/YYY.git
fatal: remote origin already exists. "
you use use ls .git to see the current git status.
We can delete the remote git via:
git remote -v # View current remotes origin https://github.com/OWNER/REPOSITORY.git (fetch) origin https://github.com/OWNER/REPOSITORY.git (push) destination https://github.com/FORKER/REPOSITORY.git (fetch) destination https://github.com/FORKER/REPOSITORY.git (push) git remote rm destination git remote rm origin git remote rm destination # Remove remote git remote -v # Verify it's gone origin https://github.com/OWNER/REPOSITORY.git (fetch) origin https://github.com/OWNER/REPOSITORY.git (push)
How to create a tar archive with faked information:
import tarfile tar = tarfile.open("sample.tar.gz", "w:gz") for name in namelist: tarinfo = tar.gettarinfo(name, "fakeproj-1.0/" + name) tarinfo.uid = 123 tarinfo.gid = 456 tarinfo.uname = "johndoe" tarinfo.gname = "fake" tar.addfile(tarinfo, file(name)) tar.close()
How to unzip a compressed file from S3 bucket.
import s3fs import tarfile fs = s3fs.S3FileSystem() model_file_s3 = 's3://directory/output/model.tar.gz' s3f=fs.open(model_file_s3, 'rb') tar=tarfile.open(fileobj=s3f) for tarinfo in tar: print(tarinfo.name, "is", tarinfo.size, "bytes in size and is") if tarinfo.isreg(): print("a regular file.") elif tarinfo.isdir(): print("a directory.") else: print("something else.") count=0 size=0 for tarinfo in tar: if tarinfo.isreg(): count=count+1 size=size+(tarinfo.size)/(1024*2014) else: print("something else.") print('num of files',count) print('size of the all files',size) tar.close()
error: src refspec remotes/origin/branch1 matches more than one
warning: refname 'remotes/origin/branch1' is ambiguous.
fatal: Ambiguous object name: 'remotes/origin/branch1'
fatal: The current branch remotes/origin/branch1 has no upstream branch ?
Solution: run the following code to solve the ambiguous issue.
git push -f origin HEAD:branch1