Denis Nekipelov - Berkeley

Identification, Data Combination and the Risk of Disclosure

    Date:  10/11/2011 (Tue)

    Time:  3:30pm- 5:00pm

    Location:  Seminar will be held on-site: Social Sciences 113

    Organizer:  Peter Arcidiacono


Meeting Schedule: (Not currently open for scheduling. Please contact the seminar organizer listed above.)

    All meetings will be held in the same location as the seminar unless otherwise noted.

   12:00pm - Lunch

    3:30pm - Seminar Presentation (3:30pm to 5:00pm)


    Additional Comments:  ABSTRACT Businesses routinely rely on the estimates of models of consumer behavior. Estimation of such models may require combination of the rm's internal data with external datasets to control for sample selection, missing observations, omitted variables and measurement errors in the firm's data. In this paper we show that data mining techniques can be used in combination with econometric analysis to identify consumer behavior models from combined data under mild assumptions regarding the data distribution. We demonstrate that point identication of a consumer behavior model from combined data is incompatible with restrictions on the risk of individual disclosure. As a result, if the consumer model is point identified, the rm would also learn the identity of at least some consumers. Using a simple example of online display advertising, we demonstrate that unless the rm uses a restriction on the individual disclosure risk when combining data, even if the raw combined dataset is not shared with a third party, an adversary can learn confidential information regarding some individuals from the estimated model.