In one aspect or another, we are all affected by data bias and this can be seen very clearly in the world of marketing. To mention the elephant in the room, take for example how Cambridge Analytica used Facebook to collect data to influence the public vote. This, in turn, forced Facebook to fix a problem they did not plan to (well not publicly). This trend of outrage then a review is also very common in Artificial Intelligence tools, built with biased data sets which at a minimum does not allow a Black person to use a hand dryer and at worst make Black people crossing the street invisible to a driverless moving car. While exploring how others were trying to fix the problem of biased AI, it became clear the people who are involved from start to finish are the ones who benefit and no matter what, they will always input their own bias. Instead of trying to fix a broken process what would it look like to create my own bias AI.
For my fellowship, I began to explore the following questions (which lead to more questions but we can get to that later)
- Can data be positively biased towards marginalised groups?
If we are already aware that the large data sets used for technology are biased in favour of creators (cis Able-bodied white men), what would occur if we flip this bias to benefit those from the most marginalised groups of the society (women, People of colour, Black & Brown people, Disabled people, those from the LGBTQIA+ community)
- Can we train a machine with intentionally biased data in favour of those from marginalised backgrounds?
Would this machine develop the same bias as other machines but produce different results? Machine learning models need a lot of data. Can this data be collected and stored ethically without be affected by wider societal bias? Could this machine model produce positive results for Black people when used in processes like CV sorting, government passport photo approvals and facial recognition?
- How can we disrupt data bias and use data to promote inclusion?
If we call into question the full process from data collection to model training and replace the people currently influencing each of these steps with someone of a marginalised background, could the results cause a shift?
The current data collection to model training process:
- The decision that the data should be collected is made by:
- The method and questions to collect the data are created by:
- The labelling of the data is done:
- The cleaning of the data is done by:
- Train machine model
- The results for the model are tested by:
- The product is marketed to: Everyone
- The product is meant to benefit: Everyone
As I began to research, learn and explore these questions it led to grouping my research findings into the following:
Marginalised Groups
- How are marginalised groups defined
- Who determines the group to be marginalised
- How do marginalised groups identify themselves
- Can these identities be binary
Data Bias
- How can data bias be manipulated
- At what point can the data be biased
- Can global biases like colourism be controlled and removed
Machine Learning
- How much data is required to train a machine
- What percentage of manipulation will be required
Positive Bias
- Can intentional bias data be used for good
- Is positive bias a solution that can be scaled
- How and when has manipulation worked or failed in the past
- What are the long-term effects of tipping the scale on society
Data bias is a broad and deep subject area and in terms of machine learning is one that is new to me. So to stay in my lane and not fall into a rabbit hole, all the questions mentioned will be focused on my sector expertise of marketing.