TY - JOUR
T1 - Computerized retrieval and classification
T2 - An application to reasons for late filings with the securities and exchange commission
AU - Feldman, Ronen
AU - Rosenfeld, Benjamin
AU - Lazar, Ron
AU - Livnat, Joshua
AU - Segal, Benjamin
PY - 2006
Y1 - 2006
N2 - This study explores a system to retrieve and classify the reasons for late mandatory SEC (Securities and Exchange Commission) filings. From the source documents, the system identifies the reasons for the late filing and classifies them into one or more of seven categories. The system can be used by potential investors who have to track a large number of filings concentrated within a day or two. Our results indicate that the SEC filings may be quite ambiguous, with experienced raters disagreeing on one category for a training sample of 600 filings in about 30% of the cases. However, allowing classifications into more than one category using document level information yields accuracy of about 90% in a test sample of 200 filings. We also show that the stock market reactions to over 9,000 late filings vary in an intuitive way according to the classified reasons.
AB - This study explores a system to retrieve and classify the reasons for late mandatory SEC (Securities and Exchange Commission) filings. From the source documents, the system identifies the reasons for the late filing and classifies them into one or more of seven categories. The system can be used by potential investors who have to track a large number of filings concentrated within a day or two. Our results indicate that the SEC filings may be quite ambiguous, with experienced raters disagreeing on one category for a training sample of 600 filings in about 30% of the cases. However, allowing classifications into more than one category using document level information yields accuracy of about 90% in a test sample of 200 filings. We also show that the stock market reactions to over 9,000 late filings vary in an intuitive way according to the classified reasons.
KW - Computerized text classification
KW - accuracy of categorization algorithms
KW - computerized categorization
KW - late filings
UR - http://www.scopus.com/inward/record.url?scp=77956225037&partnerID=8YFLogxK
U2 - 10.3233/ida-2006-10206
DO - 10.3233/ida-2006-10206
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:77956225037
SN - 1088-467X
VL - 10
SP - 183
EP - 195
JO - Intelligent Data Analysis
JF - Intelligent Data Analysis
IS - 2
ER -