Data-Cleaning Process

For each question, there are the following variables:

1. Id: ID of the firm.

2. Raw: complete responses of interviewees.

3. Clean: cleaned responses of interviewees.

4. Upper: an upper limit of interviewees’ responses.

5. Lower: a lower limit of interviewees’ responses.

6. Flag: the subgroup which the responses belong to.

7. Panel: manufacturing (Manufacturing, Electronics_Manu, Metals_Manu) or services.

8. SicDescription: the industry classification of companies, based on the ISIC divisions.

9. CompanySize: the companies are classified into small (1 to 19 employees), medium (20 to 249 employees) and large (over 250 employees).

We categorize responses into eight subgroups, and label them in "flag":

1. Points: if the interviewee gives a specific number, we put the number in “clean” (in percentage).

2. Range: if the interviewee gives both an upper bound and a lower bound, we put the upper and lower bounds in the corresponding column and get the average value in “clean.”

3. Upperbound: if the interviewee gives only an upper bound, we put the upper bound in “Upper” and put “upperbound” in flag.

4. Lowerbound: if the interviewee gives only a lower bound, we put the lower bound in “Lower” and put “lowerbound” in flag.

5. Unsure: if the interviewee is unsure, then we put “unsure” in flag.

6. Unknown: if the interviewee doesn’t know the answer—for example, he might say “I don't know”—then we put “unknown” in flag.

7. Increase/Decrease/Unchanged: if the interviewee said the value will change from its preceding value—for example, she might say “increase for 2%” or “unchanged”—we will put “increase,” “decrease,” or “unchanged” in flag.

8. No answer: there is no answer for the question (“not available” or “unusable response”).

If the answer is less than 0.2, it is assumed that it corresponds to a percentage. For example, when asked, "what do you think will be the inflation rate (for the Consumer Price Index) over the next 12 months?", if the answer is 0.1, it is recorded in clean as 10, and corresponds to an inflation rate over the next 12 months of 10%.

Once each survey was cleaned, the master file was constructed by combining all the waves. There are two versions of the master file – one with weights and one without weights. This is because there are occasions where the classification (sector or size) may change for a responding company. In the case of employment, changes may occur due to lay-offs, growth, or potentially acquisition where the activity/staff have been moved to the location. Similarly, the SIC may change due to significant changes in the operation of the company. There are also a few (rare) occasions where changes in the SIC classification system itself result in a 3- or 4-digit SIC being moved to a different branch and leading to a change in the classification of the company at the broader sector level. When a company changes its classification (sector or size), it is considered as two different companies for the weights-building process.

Useful responses were considered to be the responses classified in the first stage as "point" or "range." We use the flag "." in this case. Responses classified as "unsure" and "unknown" were reclassified as don’t know ("DK"). The "no answer" answers were classified as "NA." The rest of the answers (lowerbound, upperbound, increase/decrease/unchanged) were classified as "other."

For a downloadable codebook of variable descriptions for each file, click here.