«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

建构反应题中能力估计准确性的影响因素:评分者人数和项目个数的交互作用(PDF)

《心理学探新》[ISSN:1003-5184/CN:36-1228/B]

期数:: 2018年01期

页码:: 73-79

栏目:: 心理统计与测量

出版日期:: 2018-01-15

文章信息/Info

Title:: The Impact of Number of Raters and Number of Items on Ability Estimation in Constructed-response Items

文章编号:: 1003-5184(2018)01-0073-07

作者:: 孙小坚¹; 康春花²; 曾平飞²; 辛涛¹; 1.北京师范大学中国基础教育质量监测协同创新中心,北京 100875; 2.浙江师范大学教师教育学院,金华 321004

Author(s):: Sun Xiaojian¹; Kang Chunhua²; Zeng Pingfei²; Xin Tao¹; 1.Collaborative Innovation Center of Assessment toward Basic Education Quality,Beijing Normal University,Beijing 100875; 2.College of Teacher Education,Zhejiang Normal University,Jinhua 321004

关键词:: 等级反应多水平侧面模型; 评分者人数; 项目个数; 能力估计值

Keywords:: Key words:grade response multilevel facets model; number of items; number of raters; ability estimation

分类号:: B841.2

DOI:: -

文献标识码:: A

摘要:: 采用康春花、孙小坚和曾平飞(2016)提出的等级反应多水平侧面模型探讨了评分者人数和项目个数对被试能力估计准确性的影响。模拟研究的结果表明:(1)随着项目个数的增加,估计值与真值之间的相关也不断增加;(2)评分者人数和项目个数在平均绝对偏差(MAB)和误差均方根(RMSE)上的主效应均显著,两者间的交互效应也显著;(3)简单效应分析发现,当项目较少时,3个评分者条件下的能力估计准确性最好; 随着项目个数的增加,4个评分者的估计误差迅速下降,且表现变为最好。

Abstract:: Abstract:Using the grade response multilevel facets model(GR-MLFM),which formulated by Kang,Sun and Zeng(2016),to explore the impact of number of raters and number of items on the ability estimation.Results based on simulation show that:(1)with the increase of number of items,the correlation between estimates and true values become larger;(2)The main effect of both number of items and number of raterson MAB and RMSE are significant,also the interaction effect between these two variables is significant;(4)Post hoc analyses show that when the number of items are small,3 raters can lead to the best ability estimation; with the increase of the number of items,the error of estimates for 4 raters decreased dramatically,and yields the smallest errors.

参考文献/References

[1]戴海崎,简小珠.(2005).被试作答的偶然性对 IRT 能力估计的影响研究.心理科学,28(6),1433-1436.
[2]康春花,孙小坚,曾平飞.(2016).基于等级反应模型的多水平多侧面评分者模型.心理科学,39(1),214-223.
[3]康春花,辛涛.(2010).基于 IRT 的评分者效应模型及其应用展望.中国考试,(08),3-8.
[4]刘红云,骆方.(2008).多水平项目反应理论模型在测验发展中的应用.心理学报,40(1),92-100.
[5]刘慧,简小珠,张敏强,熊悦欣.(2012).多水平 IRT 的发展与应用述评.心理科学进展,20(4),627-632.
[6]罗照盛.(2012).项目反应理论基础.北京:北京师范大学出版社.
[7]田清源.(2006).主观评分中多面Rasch模型的应用.心理学探新,26(1),70-74.
[8]钟晓玲,康春花,陈婧.(2013).基于 CTT、 GT、 IRT 的评分者信度研究——以某届奥运会女子跳水决赛为例.考试研究,(05),41-52.
[9]周群.(2007).主观题评分标准研究.考试研究,(01),005.
[10]Andrich,D.(1995).Distinctive and incompatible properties of two common classes of IRT models for graded responses.Applied Psychological Measurement,19(1),101-119.
[11]Attali,Y.(2014).A ranking method for evaluating constructed responses.Educational and Psychological Measurement,74(5),795-808.
[12]Cohen,J.(1988).Statistical power analysis for the behavioral sciences(2ed).Hillsdale,NJ:L.Lawrence Earlbaum Associates.
[13]DeCarlo,L.T.(2010).Studies of a latent class signal detection model for constructed response scoring II:Incomplete and hierarchical designs.ETS Research Report Series,(1),i-65.
[14]DeCarlo,L.T.,Kim,Y.,& Johnson,M.S.(2011).A hierarchical rater model for constructed responses,with a signal detection rater model.Journal of Educational Measurement,48(3),333-356.
[15]de la Cruz,R.E.(1996).Assessment-bias issues in special education:A review of literature.ERIC Document Reproduction Service No.ED390246. Hombo,C.M.,Donoghue,J.R.,& Thayer,D.T.(2001).A simulation study of the effect of rater designs on ability estimation.ETS Research Report Series,(1),i-41.
[16]Kim,S.,Walker,M.E.,& McHale,F.(2010).Investigating the effectiveness of equating designs for constructed-response tests in large-scale assessments.Journal of Educational Measurement,47(2),186-201.
[17]Kim,Y.(2009).Combining constructed response items and multiple choice items using a hierarchical rater model.Unpublished doctoral dissertation,Columbia University,New York,NY.
[18]Linacre,J.M.(2007).Auser's guide to Facets:Rasch-measurement computer program.Chicago.Online:www.winsteps.com/facets.htm(01.02.08).
[19]Muckle,T.J.,& Karabatsos,G.(2009).Hierarchical generalized linear models for the analysis of judge ratings.Journal of Educational Measurement,46(2),198-219.
[20]Scullen,S.E.,Mount,M.K.,& Goff,M.(2000).Understanding the latent structureof job performance ratings.Journal of Applied Psychology,85(6),956-997.
[21]Tutz,G.(1990).Sequential item response models with an ordered response.British Journal of Mathematical and Statistical Psychology,43(1),39-55.
[22]Wang,W.-C.,& Liu,C.-Y.(2007).Formulation and application of the generalized multilevel facets model.Educational and Psychological Measurement,67(4),583-605.
[23]Wetzel,E.,Böhnke,J.R.,& Rose,N.(2016).A simulation study on methods of correcting for the effects of extreme response style.Educational and Psychological Measurement,76(2),304-324.
[24]Wolfe,E.W.(2004).Identifying rater effects using latent trait models.Psychology Science,46,35-51.
[25]Wright,B.D.(1977).Solving measurement problems with the Rasch model.Journal of Educational Measurement,14(2),97-116.

备注/Memo

备注/Memo:: 基金项目:国家自然科学基金(31371047),教育部人文社会科学研究一般项目(16YJA190002),浙江省自然科学基金(LY15C090003)。通讯作者:辛涛,E-mail:xintao@bnu.edu.cn。

更新日期/Last Update: 2018-01-15

我们的网站为什么显示成这样？

建构反应题中能力估计准确性的影响因素:评分者人数和项目个数的交互作用(PDF)

《心理学探新》[ISSN:1003-5184/CN:36-1228/B]

文章信息/Info

参考文献/References

备注/Memo

常用功能

导航/Navigate

工具/Tools

统计/Statistics