Scientific Journal Of King Faisal University
Basic and Applied Sciences

ع

Scientific Journal of King Faisal University / Basic and Applied Sciences

An Overview of Query Processing on Crowdsourced Databases

(Marwa B. Swidan, Ali A. Alwan, Yonis Gulzar, Abedallah Zaid Abualkishik)

Abstract

Crowd-sourcing is a powerful solution for finding correct answers to expensive and unanswered queries in databases, including those with uncertain and incomplete data. Attempts to use crowd-sourcing to exploit human abilities to process these expensive queries using human workers have helped to provide accurate results by utilising the available data in the crowd. Crowd-sourcing database systems (CSDBs) combine the knowledge of the crowd with a relational database by using some variant of a relational database with minor changes. This paper surveys the leading studies conducted in the area of query processing with regard to both traditional and preference queries in CSDBs. The focus of this work is on highlighting the strengths and the weakness of each approach. A detailed discussion of current and future trends research associated with query processing in the area of CSDBs is also presented.

KEYWORDS
Incomplete data model, preferences query, query processing, skyline query, top-k query

PDF

References

Alwan, A.A., Ibrahim, H., Udzir, N.I. and Sidi, F. (2016). An efficient approach for processing skyline queries in incomplete multidimensional database. Arabian Journal for Science and Engineering, 41(8), 2927–43.
Alwan, A.A., Ibrahim, H., Udzir, N.I. and Sidi, F. (2017). Processing skyline queries in incomplete distributed databases. Journal of Intelligent Information Systems, 48(2), 399–420. 
Alwan, A.A., Ibrahim, H., Udzir, N.I. and Sidi, F. (2018). Missing values estimation for skylines in incomplete database. The International Arab Journal of Information Technology, 15(1), 66–75. 
Asudeh, A., Zhang, G., Hassan, N., Li, C. and Zaruba, G.V. (2015). Crowdsourcing pareto-optimal object finding by pairwise comparisons. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, Melbourne, Australia, 19-23/10/2015.  (pp. 753–62).
Bharuka, R. and Kumar, P.S. (2013). Finding skylines for incomplete data. In: Proceedings of the 24th Australasian Database Conference, Adelaide, Australia, 24-27/5/2013. (pp. 109–17).
Bhaskar, N. and Kumar, P.M. (2020). Optimal processing of nearest-neighbor user queries in crowdsourcing based on the whale optimization algorithm. Soft Computing, 24(3),1–14.
Chang, K.C.C. and Hwang, S.W. (2002). Minimal probing: Supporting expensive predicates for top-k queries. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of data, Madison, Wisconsin, USA, 3-6/6/2002. . (pp. 346–57)
Chaudhuri, S. and Gravano, L. (1999). Evaluating top-k selection queries. In: VLDB. Proceedings of the 25th VLDB Conference, Edinburgh, Scotland, United Kingdom, 7-10/9/1999. (pp. 397–410).
Chai, C., Fan, J., Li, G., Wang, J. and Zheng, Y. (2019). April. Crowdsourcing database systems: Overview and challenges. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE) (pp. 2052-2055), IEEE. Macao, Macao, 8-11/4/2019.
Ciceri, E., Fraternali, P., Martinenghi, D. and Tagliasacchi, M. (2015). Crowdsourcing for top-k query processing over uncertain data. IEEE Transactions on Knowledge and Data Engineering, 28(1), 41–53. 
De Alfaro, L., Polychronopoulos, V. and Polyzotis, N. (2017). Efficient techniques for crowdsourced top-k lists. In: The 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19-25/8/2017.
Difallah, D.E., Catasta, M., Demartini, G., Ipeirotis, P.G. and Cudré-Mauroux, P. (2015). The dynamics of micro-task crowdsourcing: the case of amazon mturk. In: Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, 18-22/5/2015. (pp. 238–47).
EL Maarry, K., Lofi, C. and Balke, W. (2015). Crowdsourcing for query processing on web data: A case study on the skyline operator. Journal of Computing and Information Technology, 23(1), 43–60. 
Elfaki, M.A., Alwan, A.A., Abdellatief, M. and Wahaballa, A. (2019). A literature review on collaborative caching techniques in MANETs: Issues and methods used in serving queries. Engineering, Technology & Applied Science Research, 9(5), 4729–34. 
Fan, J., Zhang, M., Kok, S., Lu, M. and Ooi, B.C. (2015). Crowdop: Query optimization for declarative crowdsourcing systems. IEEE Transactions on Knowledge and Data Engineering, 27(8), 2078–92. 
Franklin, M.J., Kossmann, D., Kraska, T., Ramesh, S. and Xin, R. (2011). CrowdDB: answering queries with crowdsourcing. In: SIGMOD’11. Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, Athens, Greece, 12-16/7/2011. (pp. 61–72).
Babanejad, G., Ibrahim, H., Udzir, N. I., Sidi, F., Alwan, A. A. and Gulzar, Y. (2020). Efficient computation of skyline queries over a dynamic and incomplete database. IEEE Access Journal, 8(n/a), 141523–46.
Gulzar, Y., Alwan, A.A., Salleh, N. and Al Shaikhli, I.F. (2017). Processing skyline queries in incomplete database: Issues, challenges and future trends. Journal of Computer Science, 13(11), 647–58.
Gulzar, Y., Alwan, A.A., Salleh, N. and Al Shaikhli, I.F. (2019a). Identifying skylines in cloud databases with incomplete data. Journal of Information and Communication Technology, 18(1), 19–34. 
Gulzar, Y., Alwan, A.A., Abdullah, R.M., Xin, Q. and Swidan, M.B. (2019b). SCSA: Evaluating skyline queries in incomplete data. Applied Intelligence, 49(5), 1636–57.  
Gulzar, Y., Alwan A. A. and Sherzod Turaev. (2019c). Optimizing skyline query processing in incomplete data. IEEE Access Journal, 7(1) 178121–38.
Gulzar, Y., Alwan, A. A., Ibrahim, H. and Xin, Q. (2018). D-SKY: A Framework for Processing Skyline Queries in a Dynamic and Incomplete Database. In: Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services (iiWAS2018), Yogyakarta, Indonesia, 19–21/11/2018. 
Gulzar, Y., Alwan, A.A., Salleh, N., Al Shaikhli, I.F. and Alvi, S.I.M. (2016a). A framework for evaluating skyline queries over incomplete data. Procedia Computer Science, 94(n/a), 191–8.
Gulzar, Y., Alwan, A. A., Salleh, N. and Al Shaikhli, I. F. (2018b). A Model for Skyline Query Processing in a Partially Complete Database. Advanced Science Letters, 24(2), 1339–43.
Gumaei, A., Sammouda, R. and Al-Salman, A.S. (2017). An Efficient Algorithm for K-Rank Queries on Large Uncertain Databases. International Journal of Computer Science and Network Security (IJCSNS), 17(4), 129–32.
Khalefa, M.E., Mokbel, M.F. and Levandoski, J.J. (2008). Skyline query processing for incomplete data. p. 556-565. In: IEEE. Proceedings of the 24th International Conference on Data Engineering, Cancun, Mexico, 7–12/4/2008.
Kontaki, M., Papadopoulos, A.N. and Manolopoulos, Y. (2010). Continuous processing of preference queries in data streams. In: Proceedings of the International Conference on Current Trends in Theory and Practice of Computer Science, Berlin, Heidelberg, 23-29/1/2010. (pp. 47–60).
Kou, N. M., Li, Y., Wang, H., U, L.H. and Gong, Z. (2017). Crowdsourced top-k queries by confidence-aware pairwise judgments. In: Proceedings of the 2017 ACM International Conference on Management of Data, Chicago, Illinois, USA, 14-19/5/2017. (pp. 1415–30).
Lee, J., Im, H. and You, G.W. (2016). Optimizing skyline queries over incomplete data. Information Sciences, 361(n/a), 14–28. 
Lee, J., Lee, D. and Hwang, S.W. (2017). CrowdK: Answering top-k queries with crowdsourcing. Information Sciences, 399(n/a), 98–120. 
Lee, J., Lee, D. and Kim, S.W. (2016). Crowdsky: Skyline computation with crowdsourcing. p. 125-136. In: EDBT. Proceedings of the 19th International Conference on Extending Database Technology, Bordeaux, France, 15–16/3/2016.
Li, G., Chai, C., Fan, J., Weng, X., Li, J., Zheng, Y., Li, Y., Yu, X., Zhang, X. and Yuan, H. (2017). CDB: optimizing queries with crowd-based selections and joins. p. 1463–78. In: Proceedings of the 2017 ACM International Conference on Management of Data, Chicago, IL, USA, 14–19/5/2017.
Li, G., Wang, J., Zheng, Y. and Franklin, M.J. (2016). Crowdsourced data management: A survey. IEEE Transactions on Knowledge and Data Engineering, 28(9), 2296–319. 
Lian, H., Qiu, W., Yan, D., Huang, Z. and Tang, P. (2020). Efficient and secure k-nearest neighbor query on outsourced data. Peer-to-Peer Networking and Applications, n/a(n/a),1–10.
Lin, X., Xu, J., Hu, H. and Fan, Z. (2017). Reducing uncertainty of probabilistic top-k ranking via pairwise crowdsourcing. IEEE Transactions on Knowledge and Data Engineering, 29(10), 2290–303. 
Lofi, C., Maarry, E., K. and Balke, W. (2013). Skyline queries in crowd-enabled databases. p. 465-476. In: EDBT '13. Proceedings of the 16th ACM International Conference on Extending Database Technology. Genoa, Italy, 18–22/3/2013.
Marcus, A., Wu, E., Karger, D.R., Madden, S. and Miller, R.C. (2011). Crowdsourced databases: Query processing with people. In: Proceedings of the 5th Biennial Conference on Innovative Data Systems Research (CIDR 2011), Asilomar, California, USA, 9-12/1/2011. (pp. 211–14).
Miao, X., Gao, Y., Guo, S., Chen, L., Yin, J. and Li, Q. (2019). Answering Skyline Queries over Incomplete Data with Crowdsourcing. IEEE Transactions on Knowledge and Data Engineering, n/a(99), 1.
Mozafari, B., Sarkar, P., Franklin, M., Jordan, M. and Madden, S. (2014). Scaling up crowd-sourcing to very large datasets: a case for active learning. Proceedings of the VLDB Endowment, 8(2), 125–36. 
Parameswaran, A. and Polyzotis, N. (2011). Answering queries using humans, algorithms and databases. In: Proceedings of the 5th Biennial Conference on Innovative Data Systems Research, Asilomar, California, USA, 9-12/1/2011. (pp. 160–6).
Parameswaran, A., Park, H., Garcia-Molina, H., Polyzotis, N. and Widom, J. (2012). Deco: Declarative Crowdsourcing. p. 1203-1212. In: CIKM '12. Proceedings of the 21st ACM International Conference on Information and Knowledge, Maui, HI, USA, 29/10–2/11/2012.
Parameswaran, A.G., Garcia-Molina, H., Park, H., Polyzotis, N., Ramesh, A. and Widom, J. (2012b). Crowdscreen: algorithms for filtering data with humans. In: SIGMOD '12. Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, Scottsdale, Arizona, USA, 20-24/5/2012. (pp. 361–72).
Polychronopoulos, V., De Alfaro, L., Davis, J., Garcia-Molina, H. and Polyzotis, N. (2013). Human-Powered Top-k Lists. In: WebDB. Proceedings of the 16th International Workshop on the Web and Databases, New York, NY, USA, 23/6/2013. (pp. 25–30).
Sarma, A.D., Parameswaran, A., Garcia-Molina, H. and Halevy, A. (2014). Crowd-powered find algorithms. In: Proceedings of the 30th International Conference on Data Engineering, Chicago, IL, USA, 31/3 –4/4/2014. (pp. 964-975).
Swidan, M.B., Alwan, A.A., Turaev, S. and Gulzar, Y. (2018). A model for processing skyline queries in crowd-sourced databases. Indonesian Journal of Electrical Engineering and Computer Science, 10(2), 798–806. 
Swidan, M.B., Alwan, A.A., Turaev, S., Ibrahim, H., Abualkishik, A.Z. and Gulzar, Y. (2020). Skyline Queries Computation on Crowdsourced-Enabled Incomplete Database. IEEE Access Journal, 8(n/a), 106660–89. 
Wang, J., Krishnan, S., Franklin, M.J., Goldberg, K., Kraska, T. and Milo, T. (2014). A sample-and-clean framework for fast and accurate query processing on dirty data. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, Utah, USA, 22/27/6/2014. (pp. 469–80).
Wang, J., Li, G., Kraska, T., Franklin, M.J. and Feng, J. (2013). Leveraging transitive relations for crowdsourced joins. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, USA, 22/24/6/2013. (pp. 229–40).