Homework

Homework. Due Frid October 19, 2018 at 11:59pm [html] [rmd] [pdf]
Solutions:[html] [pdf]

Final Project

Guidelines:

Downloadable Instructions: [html] [pdf]

Choose a project that interests and excites you. It is intended as a chance for you to get more practice with R and to explore its more advanced tools. You might consider working on a problem related to your own research and use your own data. However, you should focus more on programming rather than on answering a research question. Through the project you should demonstrate the skills you have learned by taking this class, but you are encouraged to implement material not taught in class.

You are responsible for formulating your own project. However, you should consult with me on the scope of your planned work. Below, I included a list of topics you might want to consider when planning your project.

You can work in groups of up to four students and you are encouraged to do so. The goal of the project should be to have fun! You are taking this class because you want to learn about R, and this is your opportunity to it in any way you like. As a rough estimate, the project should take you around 20 hours of work (per person in the group).

Schedule for deliverables:

Proposal (title and one-paragraph abstract) due Tue October 16, 2018 at 11:59pm.
Final write-up and R code due a week after the last class Thu November 1, 2018 at 11:59pm.

Everything should be uploaded on canvas before the deadline.

Project Ideas:

Biological data. For example, projects could involve differential expression, flow cytometry, mass spectronomy, image analysis, or phylogeny estimation, clustering of single cells, data visualization multivariate analyses. There are many packages on Bioconductor and CRAN that are specifically developed to perform these kinds of analysis. Many datasets and implementations are available on Bioconductor repository.
Economic data. For example, you can find rich data on the worldwide distribution of wealth and income on World Inequality Database.
Sports data. For example, decide which basketball player would you want on your team? Data available on this blog post.
Financial data. Analyze stock returns, or compute optimal portfolios. Check here what people in the field do by review and see more resources here. You can also download data from this database)
Twitter data. Use the content of the posts to perform activity and sentiment analysis.
Yelp data. [Similar analysis]((http://varianceexplained.org/r/yelp-sentiment/ ) can be done for yelp data, using this dataset.
Movie preference data. For example develop a movie recommender system.
Cities/States data. A list of available datasets is provided here Use them to either:
- identify the best and worst neighborhoods to live in based on different metrics like how many parks are within walking distance, crime statistics, etc.
- identify concrete measures your city could take to improve different quality of life metrics like those described above – say where should the city put a park, or
- predict when/where crimes will occur
Find a kaggle competition present or past and solve the problem using R.
Find a dataset on UCI Machine Learning Repository and try to draw some interesting insigts.
Or create a stunning set of visualizations for a data set of your choice.

More public data is available here and here.

back