Code of Conduct
How to extract data from the web with integrity
Automated browsing and collection of content from public websites is legal in the USA, the EU and other jurisdictions. At Import.io we aim to ensure that our Web Data Integration projects are not just legal but that they do no harm to the websites that we automatically browse. What follows are the minimum standards that we expect all Import.io employees, contractors and agents to adhere to when extracting data from the web using Import.io technology or on behalf of Import.io or Import.io’s customers.
Do…
- Do access / request data in a reasonable way to help ensure site performance is not impaired.
- Do tailor your approach to the site (considering characteristics such as the site owner’s market size, geographic location, time zone and the site architecture) and adjust approach as required with website changes or indications of performance interruption.
- Only collect the data that you need.
- Do use commercially reasonable efforts to comply with robots.txt including monitoring for changes and adjusting practices accordingly.
- Do immediately report a cease and desist or similar communication from site owner to Import.io management.
- Do respect intellectual property rights, in particular copyright laws.
- Avoid known litigants.
- If you suspect that an extracted field may contain PII, then use Import.io’s PII-redaction feature in order to remove PII and to prevent a WDI project from inadvertently collecting PII.
- Only extract data from websites requiring a login where you have valid login credentials.
- If a site has a paywall then Import.io cannot be used to circumvent that paywall and either the customer or Import.io must have a valid subscription.
Do not…
- Do not interfere with the operation of the site or impair the site’s performance.
- Do not engage in practices which may be perceived as a DDoS attack.
- Do not harvest protected or commercially sensitive information (e.g. containing data on financial transactions, credit card numbers or ids of the site owner’s clients) on behalf of Import.io. The only exception to this rule is when the site owner is directing or expressly permits the data collection.
- Do not harvest data that, by its substantiality and criticality, is likely to harm revenues of the site owner.
- Do not declare any information about Import.io in the user agent. Import.io Confidential
- Do not automate the acceptance of terms of service.
- Do not support WDI projects where you believe that creative works extracted from a website will be republished or redistributed in breach of copyright.
When harvesting personally identifiable information (PII):
- Ensure that permission is granted by the CFO following consultation and a data protection impact assessment with legal counsel.
- Ensure that the “legitimate interests” justification, under which the project is initiated, is maintained over the course of the project.
- For approved projects involving the collection of PII, Import.io will usually be a data processor and our customer will be the data controller, this relationship will be explicitly defined in the contract governing the project.
- Where Import.io is a data processor, any PII data that is collected should only be temporarily stored in our systems for the sole purpose of fulfilling our data processing obligations, once processed PII data has been successfully transmitted to the data controller, the data should be destroyed from Import.io systems.
- If the occasion ever arises where Import.io is both data processor and data controller then the following additional rules should be followed:
- Provide a means for data subjects to exercise their rights (such as opt-out, erase, subject access request).
- For EEA data subjects, ensure that they are made aware that Import.io holds their data within a month of data collection; and
- Adhere to the Import.io Records Retention Policy.
- Do not harvest categories of PII considered to be “sensitive”, requiring additional care when handling, including health, racial or ethnic origin, sexual life or orientation, religious or philosophical opinions, trade union membership or genetic or biometric data (for the purpose of uniquely identifying a living individual).
Last updated: 8 April 2020