Git is a very popular version control system for tracking changes in computer files and coordinating work on those files among multiple people (Wikipedia). It is well used in Data Science projects to keep track of code and maintain parallel development. Git can be used in a very complicated way, however, for Data Scientist, we can keep it simple. In this post, I am going to walk through the main use cases if you are a "Solo Master".
Note. There are many awesome resources out there talking about "what is Git", and "the basic concept in Git", I would refer to the official Git website on this "Getting Started -- Git Basic" Now we can start with some cool project! First, let's go to Github to create an empty project, then start to config it properly on your local laptop. Case 1. One working space, nothing goes wrong This is the ideal and simplest situation, what you need to do is just add more files to one commit, commit the code, and then push to the remote master branch. Life is so easy under such situation. Case 2. One working space, mistake before "git add" This always happen ... you started playing with your idea, and added a few draft code in the file, and quickly figured out this idea does not work, and now you want to get back the clean slate. How to do that? Fortunately, if you didn't run any "git add" on the new file, this is very easy. For more details, please refer to "Git checkout". Case 3. One working space, mistake before "git commit" You thought the idea is going to work, added a few files, made some changes, did a few "git add", and finally, you figured out the result is not right. Now you want to get rid of the mess and back to the nice, right, old code. For more details, please refer to "Git reset". Case 4. One working space, mistake before "git push" You went even further this time, not only you did "git add", but also this modification took a few hours and you also did a few "git commit"! Ah, another huge mistake, what to do?! For more details, please refer to "Git reset". Case 5. One working space, mistake after "git push" You pushed the code to production, and other members found this is a big mistake/bug. Now you need to revert the code back to where it was. For more details, please refer to "Git revert". Case 6. Multiple working spaces You have two working spaces, one is in your company laptop, one is in your company work station. You develop feature 2 in one working space, and feature 3 in another working space. Now you see the problem, and the solution is to use "git pull" first. "git pull" = "git fetch" + "git merge" or "git fetch" + "git rebase" For the details, refer to "Git pull". Remember, now the remote branch looks like following Now, as long as you develop each individual features in each working space, this process would have no problem. This is considered a better practice than working on the same feature in different working space. Because if the same file is modified in different spaces, the "merge" process will have many conflicts and resolving that would be a huge deal for "solo masters".
Great, now after these simple case studies, you become the real "solo master" in Git. You will never lose any code (it will always be pushed to the cloud) or worry about code inconsistency in multiple working spaces (as long as "git pull" is used correctly). Enjoy using Git!
0 Comments
I love taking various online education resources to broaden my view and knowledge base. Recently, I finished the "Executive Data Science" specialization provided by Coursera, and found its quite helpful. Just write some thoughts on what I learned from this course. ![]() s These series courses does not provide technical knowledge for people to become data scientist, but offer the insights and toolkits to lead a data science project and manage a data science team. Although its discussion is mostly focusing on "statistical analytics insights - data scientist" work setting, I think quite a few concepts can still be transferable if working in a "machine learning product - data scientist" environment. The highlights in this specialization (for me) is following: 1. How to build a team: different focus and cooperation between "data engineer", "data scientist", and "business analyst". Although in reality, many times we play all three hats, it is nice to realize that intrinsically there are some difference so that to grow a team, we know what is the next stage hiring or knowledge sets required. 2. How to manage project: basically, using statistical sense to identify the top priority and move agilely along the right direction. I think most of the discussions make common sense in data science region, and it really helps to get a summarize view about how to prioritize data science focus, identify the right talent to do that, and continuously monitor/guide the project to produce end-result. Appreciate the faculties in Johns Hopkins to produce this great specialization. Happy learning! |
AuthorData Magician Archives
October 2017
Categories
All
|