RSM801 Week 1 Assignment

Make sure you spend most of your time writing up the results section of the assignment, as it is the most important piece. The format is very important, so make sure your text, tables, and figures are all following APA format.

RSM801 Week 1 Assignment

In a survey with adult women, 150 participants answered questions about depression and media use.

Before we can explore relationships between these variables, we must check the data set for errors, outliers, and in the case of scales, transform and sum variables.

To begin, open the data file (CESD(1).sav, click on SAVE AS, once downloaded, open the file with SPSS)

Once the data is open in SPSS, notice the tabs at the bottom, there is a variable view and a data view.

· The data view allows you to see all the data (this is where you would enter new data).

· The variable view allows you to see the variables (this is where you would create new variables).

DESCRIPTIVE STATISTICS

We need to explore our data to get a beginning picture of what is happening.

First try ‘Descriptives’

· In the menu bar, click on ANALYZE/DESCRIPTIVE STATISTICS/DESCRIPTIVES

· a dialog box will open

· Highlight the variables:

Age

HHSize

· Click on the arrow in the middle that moves them to the box on the right

· Click on OPTIONS (click on any options that you might want to add)

· Click on RUN

· Your output file will now open

Q1) Provide the means for the following variables:

Age ____________

HHSize _________

Q2) Next, in the menu bar, click on ANALYZE/DESCRIPTIVE STATISTICS/FREQUENCIES

· Now highlight the variables:

Maritalstatus

Education

Work

· Click on the arrow in the middle that moves them to the box on the right

· Click on OPTIONS (click on any options that you might want to add)

· Click on RUN

· Your output file will now open

What percentage of the sample is:

Married _______________

Has less education than a college degree? (Hint: Look at cumulative frequencies)

________________

Worked for pay? _________________

Q3) For this assignment, you also need to generate a depression score for each participant. There are 20 items from the CES-D scale (variables CES_0001 to CES_0020).

Transform->Recode into Different Variables

To start, you will need to reverse score the four positive items (4, 8, 12, and 16).

1. Transform->Recode into Different Variables

2. Click on “Display Variable Names” so that you can see the variable names

3. Click on CES_0004 and move it to the middle.

4. The 'Output Variable' will be Named CES4rev with the 'Label' Reverse score of CES4 Categories.

5. Repeat steps 3 and 4 for CES_0008, CES_0012, and CES_0016 (with the output variables being called the corresponding name – e.g., CES8rev, CES12rev, and CES16rev).

6. Click on 'Old and New Values

7. Code those individuals who reported “Most or all of the time” ('Old Value' = 3 into 'New Value' = 0)

8. Code those individuals who reported “Occasionally or moderate amount of time” ('Old Value’ = 2 into 'New Value = 1)

9. Code those who reported “some or a little of the time” ( 'Old Value = 1 into 'New Value = 2)

10. Code those who reported “rarely or none of the time” ( 'Old Value = 0 into 'New Value = 3)

11. Click OK

12. Run Analyze->Descriptive Statistics->Frequencies comparing the reversed and original frequencies for items 4, 8, 12 and 16

13. Include the output here for Q4 frequencies (copy and paste frequencies for CES_0004 and CES4rev). Does this match what you would expect to find?

Q4) Transform->Compute Variable

There are times when we will want to compute a new variable based on the data we have. Create a new variable summing the 20 items from the CES-D scale.

1. Click on Transform->Compute Variable

2. Your new variable ('Target Variable') will be called CESDTOT.

3. Use the following formula to generate the new variable: (CES_0001 + CES_0002 + CES_0003 + CES4REV + CES_005 + CES_006 + CES_007 + CES8REV + CES_009 + CES_010 + CES_011 + CES12REV + CES_0013 + CES_0014 + CES_0015 + CES16REV + CES_0017 + CES_0018 + CES_0019 + CES_0020)

4. This formula takes the 20 items (including the 4 that needed to be reversed) and sums them together.

Use the 'Explore' command in SPSS (Analyze->Descriptive Statistics->Explore) and determine the measures of central tendency and spread for CESDTOT (Depression sum score) and Media use (Media use Score).

· A dialog box will open.

· You will need to put your variables of interest into the DEPENDENT LIST box (similar to how you did it in DESCRIPTIVES previously).

· For your dissertation, you may also want to include a grouping variable into the FACTOR LIST box (A factor is the same thing as an independent variable; Thus, this allows us to break the statistics up by group membership). For now we will leave this box empty

· Click on RUN

· Review the new output

· Summarize the mean and SD for CESDTOT and Media Use (round numbers to two decimal points)

Q5) Some of the data is missing for individual CES-D items. The important thing in dealing with missing data is to figure out if the data is missing randomly or if there is some pattern (reason) to why the data points are missing. Does there appear to be a pattern to the missing data?

How might one deal with the missing data? (Do not do this, simply report what you think based on our discussion this week).

Q6) Examine the Descriptive Statistics output you generated for CESDTOT and MediaUse for outliers. Remember that univariate outliers are those with very large standardized scores (z scores greater than 3.3) and that are disconnected from the distribution. SPSS DESCRIPTIVES will give you the z scores for every case if you select save standardized values as variables and SPSS FREQUENCIES will give you histograms (use SPLIT FILE/ Compare Groups under DATA for grouped data).

Did you find any univariate outliers? Briefly write up your conclusion about univariate outliers, using data to back up your report.

Q7) Finally, write up the results of your descriptive statistics analysis (Q1-6) in APA format as if you were describing the analysis for your dissertation (it will probably be only a paragraph). Make sure to include figures (e.g., a box plot). The APA formatting may be difficult, but it will be helpful in the long run to spend some time learning it properly now.

If you have any questions, do not hesitate to ask! This is meant to be a learning exercise.

and share. All righty, you all. So, welcome to RSM 801, your Quantitative Research II course. I am Dr. Chastity Ratliff, and we've got a lot to cover today, so I'm not going to go into a lot of details about my background, but this is just a visual depiction of my education journey from Southeast Missouri State University and doing studies on animal behavior learning to my later graduate school education, where I focused on psychology and law and societal threat, prejudice, those types of things while I was working on my PhD. So I've been doing research for a lot of years. I have a few, I believe I have four peer-reviewed journal publications from the research that I've done, and I've been teaching at the undergraduate and graduate level for about nine years altogether now. Okay, so I like to start off my stats and methods courses with a little just pause to try to get you all to really start thinking differently and try to shift your approach to learning in these kinds of classes. Because I get a lot of students who do really well in their theory classes that come into these stats and methods courses and struggle when they lose points or don't hit all the marks. So I like to try to take a few minutes to explain the difference and what you'll need to do to be successful. So your theory courses test your ability to understand and analyze concepts, but methods and statistics courses require you to actually apply those concepts to solve novel problems. So the feedback that I give you all, and I spend a lot of time giving you feedback, and I hope that you use it, it focuses on precision and accuracy rather than those broader theoretical interpretations. So my feedback is really going to focus on how you are applying those things within the bounds of the scientific method and the research process. So this is also a new type of academic writing. Results sections require very precise standardized formatting. And a lot of students find that tedious or annoying that you have to put this thing in italicies and use so many decimal points. And we think those kinds of things don't matter and you might not pay that much attention to it. But it really does matter for the reader who is trained in these types of fields to be able to know how to read and understand what you're trying to communicate. So statistical writing emphasizes clarity and accuracy over elaboration. Forget all that stuff about trying to fluff up your papers to hit the page requirement. We want clear, accurate result sections instead of fluff. So APA style guidelines must be followed exactly. And the sooner that you just get in the habit of writing in that way, it just becomes second nature and it improves the overall quality and readability of your writing. So this type of writing overall is more technical and concise than what you will find in theoretical courses. So essentially, learning statistics, you all, is like learning a whole new language. You're learning a new way to communicate your research findings. And for this reason, regular practice is essential. You cannot cram for statistical understanding or substitute your own understanding with that of a type of AI or something like that. So mistakes are so normal and they are actually a very valuable part of the learning process. us. And I want you to make mistakes early on and listen to the feedback that I give you and incorporate that because those are such valuable learning experiences. So, like all of these kinds of classes, the skills that you build each week are going to build on each other throughout the whole term. As I've mentioned, the feedback I give you is going to be detailed and specific. I tend to focus on the issue that needs to be addressed because I have a lot of grading and I want to zero in on the things that need work. So please don't take that personally. My focus is on helping you master the technical skills. So my feedback to you is going to address whether you actually completed the assignment or the task as required. So on execution, rather than on how much you tried or if you understood it, but you didn't demonstrate that or communicate that, I have to address that in my feedback. But trust me, I'm spending all this time giving you all feedback and guidance now to make you stronger down the line when you face your comprehensive exams and your ultimate dissertation. So, some things that you can do to be successful is do not underestimate the amount of time that you need to focus on this class. It's a lot of work, and if you struggle with this kind of material, then I would say you need to plan for about 15 to 20 hours per week of dedicated work. I know that feels overwhelming, but if you're a student who struggles, I'm especially talking to you. So start your assignments early so that you have time to ask me questions and make revisions. Make sure you're in the class four or five times per week at minimum, and it will do you well. It will serve you well to schedule regular practice sessions with your statistical software of choice. Now, you have to be an active learner on your end. all of these doctoral classes are only eight weeks. So, but you are getting all of the material and learning that you would in a 16 week course. We're just cramming it into these eight weeks to let you all work on your PhDs from wherever you are and still work. So you have to be active on your own. We only get one live hour together per week, but I am going to spend that time working. I'm going to give you a little lecture each week and then try to walk you through examples step by step. I encourage you to create your own practice problems. I've had students in this course in the past, the most successful students, got together with their classmates and formed study groups. They were immensely helpful, and I could always tell the students who were working in the study group. one of them would ask questions. They would work together to fill in the pieces that they missed. So cannot encourage that enough. Rely on each other. I encourage you to maintain an error log to track and learn from your mistakes so that you can correct that in later work. So all building on that, use my feedback effectively. Like I said, I take a lot of time to give it to you, not to beat you up, but to improve your work. So please review all of the feedback carefully before starting new assignments. Keep a running list of common errors to check for in your future work so that you're not making the same mistakes over again. And always please ask for clarification if assignments or feedback are unclear. So on that, what you can expect from me throughout this term, lots of support you all. I am, I'm here to support you. I want to see you succeed. I'll be holding weekly live sessions to review the concepts and go over the assignments. There are clear rubrics for all of the assignments. I suggest you look at those before submitting your work. I am always available for individual meetings. All you have to do is send me an email with three days and times that you're available. Just save that time in the beginning if you want to meet with me or talk to me just say hey i'd love to have a meeting i'm available monday from you know whatever to whatever tuesday and wednesday send me your availability and then i'll get back to you uh within 24 uh work hours monday through friday about a meeting we'll make it happen and i'm also absolutely willing to give you additional resources and examples if you find yourself stuck. I just needed to reach out to me. So when you reach out, you can expect responses to emails within 24 hours on weekdays. I will make regular course announcements and try to give you very clear assignment instructions and explicit grading criteria through the rubrics. My teaching approach is supportive. I try to give you all step-by-step instruction for new concepts with with examples. These assignments are scaffolded, meaning they build upon one another progressively, and overall the focus is on practical application. Now, what do I expect from you all? I really expect you to uphold academic standards, meaning that your work should demonstrate graduate level attention to detail, precise adherence to APA format, clear documentation of all of your statistical procedures, and professional academic writing. So on that note, I do not want to see AI generated work. You can use AI in early stages of research for brainstorming and things like that. And at the end to kind of polish your work, but the bulk of the work needs to be yours. I want to see the messy struggle because I've seen students come through this program enough now to see those who are over -relying on AI get into upper level classes, into comps, and really struggle because they didn't take the time to really learn the earlier material and it starts to show up. And so that's unacceptable. And I will be on the lookout for that and really do not want to see you all using AI generated work in this class. So I want you to authentically engage with your classmates in discussions. I need you to be proactive with me and communicate about any challenges you might be facing. I expect timely submission of assignments, and I do not accept late work without reaching out to me. So if before the due date comes and you realize you're not going to be able to make it due to some circumstance, shoot me an email. If you shoot me an email before it's due and let me know whatever's happening, give me just a little cliff note summary of whatever is happening, I most likely won't take any points off. But if you don't email me until after the fact, I'll go from there. But if you don't email me at all and you just submit a late assignment, I'm just not going to grade it. So that's kind of the deal we have here. I will absolutely be flexible and understanding and kind about whatever you have going on, but you have to communicate that with me, okay? All right, so that timely submission, actively engaging with my feedback, and then this class is challenging for a lot of students because you have to develop some technical skills for the analyses. So, you've got to get some proficiency going with the statistical software. You need to be able to learn to follow those statistical procedures exactly, to be able to clearly present your output, and to accurately interpret your results. All right, so on that note, as I've mentioned, you are free to either use Jamovi or SPSS. If you choose to use SPSS, you do need the grad pack. The standard does not have everything you need to do all of the analysis. So you need the more expensive package. But if you choose to use Jamovi, it is free. That's what I will be doing all of my demonstrations in, and the add-ons are free in it as well. So, it's capable of doing everything that we will do in this class. It is what I will be using, but if you're more comfortable with SPSS, you're free to use that, and there are some additional resources in the course for SPSS users. Okay, so now I'm going to jump into week one. Week one stresses students out in this class, you all, just as a heads up, because we jump right in. We're essentially jumping right in at the point where if you had your data, if you had your results, you would dive in and start trying to understand the quality of your data okay so that's where we're starting this week we're building on foundations from quant one the focus here is on advanced data handling and analysis so we do jump right in but this is of critical importance for everything that comes after particularly your dissertation research and the emphasis throughout is on real world challenges in conducting research. All right. So when we talk about data quality, the first two things we need to look at are outliers and missing data. So let's start with outliers. What is an outlier and how do you determine if you have them? We'll see these little dots out here that don't follow the rest of the pattern. This is a univariate outlier, the multivariate outlier. So univariate varies in terms of a single variable and a multivariate outlier is one which involves more than one variable. So what you're looking at here are scatter plots. These are what we use to one of the ways to determine if we have outliers looking visually. Another way is to use box plots. We can also use Z -scores, which we will do. So after you've gone through these techniques and identified your outliers, then you have to investigate potential causes for those outliers. So, was it an error in data collection or was it a valid but unusual data point? So, how do you handle outliers if you find them? Well, first, you check for error. So, any typos or data entry mistakes could be corrected or removed. So, for example, if someone said they worked 150 -hour work week, that clearly isn't valid. We would likely remove that. We have to consider the theoretical relevance. So, is the score, is that outlier unusual but meaningful for your population? So, for example, a very high burnout score might represent an actual at-risk teacher or employee. It can be an outlier, but still valid and meaningful. so to handle these you want to use transparent common rules so you want to keep the outlier if it is within the plausible range of your measure you want to transform it through a log transformation for example if your distribution is skewed another technique is to Windsorize or to cap the extreme values if the outlier is legitimate but overly influential. It's skewing your results. But remove it only if you can clearly justify why that case doesn't belong to your study population. In general, we try to keep outliers and you must if it represents a valid score in your sample. Finally, you need to document your choice of how you identified the outlier. So, a Z -score greater than or less than 3.3 box plots, scatter plots, or another way. State what you did and why in your results and methods section. Okay, so we've got outliers handled. Now, let's talk about missing data. um the general steps for analysis with missing data are to one identify the patterns and reasons for the missing data and recode correctly if necessary you need to understand the distribution of missing data and finally decide on the best method of analysis this. So going to step one, understand your data. Why is it missing? Is it due to attrition? So some social or natural processes that cause people to drop out of your study. So maybe you were studying students and some of them graduated or dropped out. Or if you're studying an elderly population, maybe some of them passed on. What are the reasons for the missing data? It could be due to a skip pattern. So, for example, certain questions only ask respondents who indicated they were married. So, sometimes only certain participants are asked to answer a question. So, that's a legitimate skip pattern. It could be intentional. So, missing as part of the actual data collection process where only one condition sees one set of questions and another sees a different set. It could be random data collection issues, or it could be respondent refusal or just non-response. In other words, they didn't answer the question. Step two, so we've dug into it. We've kind of figured out what we think the reason is for why the data is missing. And then we have to look at that probability distribution of missingness. So considering that probability is asking ourselves in looking at the data, are certain groups more likely to have missing values? So for example, are respondents in service occupations less likely to report their income with TIPS. Another way to consider it is are certain responses more likely to be missing? So, for example, respondents with high income are less likely to report their income. So, we need to see if that's the case. And then, so certain analysis methods, those assumptions that we check before we do our analyses, this is part of it because certain analysis methods assume a certain probability distribution. So if your data doesn't meet that assumption, you need to do something about it. So, missing data mechanisms, we can just break this down into three categories. So, data that's missing completely at random, MCAR data. So, this is when the missing data doesn't depend on either observed or unobserved values. So, for example, lab equipment that was recording the time, randomly malfunctioned, causing some missing measurements. That is missing completely at random. Missing at random is when the missing data does deserve on observed variables, but not unobserved values. So, for example, service workers are less likely to report their income and that is related to their occupation type so that's missing at random but it's not completely at random because it is tied to part of what we're studying and then there is missing not at random so this is missing data that depends on the unobserved values themselves So, for example, those high-income individuals that are less likely to report their income, they will just not answer. Okay. So, when we're exploring missing data mechanisms, you can never be 100% sure about the probability of the missing data because we don't actually know the missing values. So there are things we could do, like we could test for missing completely at random using t -tests, but that's not actually totally accurate. So many missing data methods assume that it's missing completely at random or missing at random, but in reality, our data are often missing not at random. So, we also need methods specifically for that, including a selection model like Heckman or pattern mixture models. So, we don't have to go too far into the weeds here, just giving you an overview. Because this is what we really need to dig into. How do you deal with your missing data? Well, you have to use what you know about why the data is missing and anything you know about the distribution of missing data. Are there any meaningful patterns? And then you can decide on the best analysis strategy to get you the least biased estimates. So your options are deletion methods like list -wise or pair-wise deletion. There are single imputation methods, so mean or mode substitution, the dummy variable method, or single regression. And then there are model -based methods, so maximum likelihood modeling and multiple imputation modeling. Let me dig into those deletion methods. So list -wise deletion is when you're looking at one complete case and doing, I'm sorry, so list-wise deletion, you would take out all of the responses. And pair-wise deletion, we're looking at pairs of responses there. So a little more concrete example here. So with list -wise deletion, we're only going to analyze cases with the data available on each variable. That's list-wise. So looking here, we would not analyze at all these first two measures because they don't have data on each variable. So we would essentially ignore all of that data. So the advantages of that are that it's simple and it gives us easy comparability across analyses. But big disadvantages, you're losing data. So it's reducing statistical power, you're lowering your sample size, you're not using all available information. And with this kind of list-wise deletion method, if the data is not missing completely at random, then your estimates may end up biased. So as a note, list-wise deletion often produces unbiased regression slope estimates, as long as missingness is not a function of that outcome variable. So here's an illustration of pair-wise deletion. so this is saying go ahead and analyze um all of them in which the variables of interest are present so that would be saying in this example if we were only looking at eighth grade or let's say we wanted to look at eighth and twelfth grade test scores but we wanted to to eliminate gender. So then we could analyze the third, fourth, and fifth cases. We could actually go ahead and analyze the sixth case because gender is missing, but that's not one of the variables we're looking at. So that would make sense for pairwise deletion. Hopefully that makes sense. You all, if not, feel free to interrupt me at any time if you have questions. Dr. Ratliff, I do have one question. Yeah, please. So is it either, is it one or the other, or can you use a combination of both with your data? You'll want to choose one way or the other. Okay. Yeah. So you will, you're going to spend a lot of time when you get your hands on data, you're going to want to dig in and try to figure out why there are any outliers or missing data and then decide how you're going to go about handling it so list wise is saying like they either completed everything or we're not taking them and pairwise is saying as long as they completed everything for this analysis we're going to keep it okay thank you yeah so advantages of this it does keep as many cases as possible and uses all information possible within each analysis But the disadvantage here is we can't make direct comparisons across analyses and the data set because your sample is slightly different each time. So, say one time we wanted to look at gender and eighth grade math test scores here. That would give us a certain number of pairwise options to work with. And then if we wanted to look at gender and 12th grade math score. So, that's the disadvantage because we end up with a slightly different sample each time by using pairwise deletion. Okay. So, some other methods would be the single imputation methods. So, mean mode substitution, dummy variable controls, and conditional mean substitution. So, I'll go through these with you. mean mode substitution replaces missing values with the sample mean or mode for that variable and then you would run the analysis as if all the cases were complete so advantage of mean mode substitution is that you can use a complete case analysis method but the disadvantage here is that it reduces variability. When you have missing data and you just plug in the mean or the mode, that reduces variability, which is this flattening line that you're seeing here. Reduces variability and weakens the covariance and correlation estimates in the data because it's ignoring that real relationship between variables and assuming the mean. Now, dummy variable adjustment is another option. This is where we create an indicator for the missing value. So, we would say one, we would enter a one if the value is missing for the observation, and a zero if the value is observed for that observation, for that question. Then we would impute the missing values to a constant, such as the mean, and include that missing indicator in a regression analysis. So doing this, using the dummy variable adjustment, does use all available information about that missing observation. But disadvantage is that it absolutely results in biased estimates and is not theoretically driven. So the dummy variable adjustment is really the best when the value is missing because of a legitimate skip. So when it was legitimately skipped, then it makes sense to use the dummy variable adjustment. Otherwise, you run the risk of bias results. All right, now there's regression imputation. So what this does is replace missing values with a predicted score from a regression equation. So this is a little more precise and complex. Advantage here is that it's using information from the actual observed data. But the disadvantage here is that regression imputation tends to overestimate the model fit and the correlation estimates, and it weakens variance. So, I'm sure you're picking up on it. With each of these methods, you kind of have to weigh your pr

RSM801 Week 1 Assignment

RSM801 Week 1 Assignment

DESCRIPTIVE STATISTICS

Order a plagiarism free paper now

Need your ASSIGNMENT done? Use our paper writing service to score better and meet your deadlines.

Recent Posts