Stanford introductory course on programming in R.
Downloadable Instructions: [html] [pdf]
Choose a project that interests and excites you. It is intended as a chance for you to get more practice with R and to explore its more advanced tools. You might consider working on a problem related to your own research and use your own data. However, you should focus more on programming rather than on answering a research question. Through the project you should demonstrate the skills you have learned by taking this class, but you are encouraged to implement material not taught in class.
You are responsible for formulating your own project. However, you should consult with me on the scope of your planned work. Below, I included a list of topics you might want to consider when planning your project.
You can work in groups of up to four students and you are encouraged to do so. The goal of the project should be to have fun! You are taking this class because you want to learn about R, and this is your opportunity to it in any way you like. As a rough estimate, the project should take you around 20 hours of work (per person in the group).
Everything should be uploaded on canvas before the deadline.
Biological data. For example, projects could involve differential expression, flow cytometry, mass spectronomy, image analysis, or phylogeny estimation, clustering of single cells, data visualization multivariate analyses. There are many packages on Bioconductor and CRAN that are specifically developed to perform these kinds of analysis. Many datasets and implementations are available on Bioconductor repository.
Economic data. For example, you can find rich data on the worldwide distribution of wealth and income on World Inequality Database.
Sports data. For example, decide which basketball player would you want on your team? Data available on this blog post.
Financial data. Analyze stock returns, or compute optimal portfolios. Check here what people in the field do by review and see more resources here. You can also download data from this database)
Twitter data. Use the content of the posts to perform activity and sentiment analysis.
Yelp data. [Similar analysis]((http://varianceexplained.org/r/yelp-sentiment/ ) can be done for yelp data, using this dataset.
Movie preference data. For example develop a movie recommender system.
Find a kaggle competition present or past and solve the problem using R.
Find a dataset on UCI Machine Learning Repository and try to draw some interesting insigts.