Skip to content

methods when all variables are measured with error #3

@BERENZ

Description

@BERENZ

In this case we assume that all variables that we observe $(y_i^, x_i^, z_i)$ and $(x_i, z_i)$ in non-probability and probability sample (or population) and $^*$ informs that a given variable is mis-classified.

Motivating example is as follows:

  • target variable: we require English language -- this may be provided in a given ad but for some this may be missing but we could derive this from the text (say the ad is in English or it is stated in the text that English is "our language")
  • auxiliary variables ($X$): information about the occupation is missing and we derive this using our classifier
  • auxiliary variables ($Z$): information about a given company (measured without an error, say the size, NACE, public/private)

Research questions:

  • how we can deal with such cases? what literature say about that?
  • what is the bias when we estimate regression model on $E(y_i^* | x_i^*, z_i)$ instead of $E(y_i | x_i, z_i)$?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

Status
Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions