35 Matching Annotations
  1. Sep 2024
    1. This literature review has explored the critical aspects of data quality in wireless sen-sor networks WSNs) with a focus on pedestrian monitoring. Key dimensions such asaccuracy, completeness, consistency, timeliness, and validity have been examined, high-lighting the importance of maintaining high data quality for reliable pedestrian data.Carrow Morris-Wiltshire June 27, 2024RADIAN – A Library for Scalable Quality-Aware Pedestrian Data Streams 18Methodologies for data quality assessment, including statistical measures, machine learn-ing approaches, and event detection techniques, have been reviewed. These methodolo-gies are essential for real-time detection and monitoring of data anomalies, ensuring thereliability of pedestrian counts. The review also covered strategies for managing andimproving data quality, such as automated imputation for missing data and denoisingtechniques, which are crucial for maintaining data integrity. Enhancements in WSN ar-chitecture aimed at improving data quality from the source were discussed, providinginsights into building a scalable pedestrian data quality management system. Challengesand future directions emphasise the need for centralised governance and standardisa-tion when building such platforms, ensuring interoperability and scalability in smart cityapplications. This research contributes to the development of a robust and scalable sys-tem for managing pedestrian data quality, facilitating more accurate and reliable urbanmobility monitoring.In summary, maintaining high data quality in WSNs for pedestrian monitoring requires acomprehensive approach, integrating advanced data quality assessment methodologies,real-time monitoring, and proactive management. This ongoing research aims to addresscurrent challenges and support the creation of effective, scalable data quality manage-ment systems for smart cities.

      Summary is Ok but I would like to see you really draw together the material to produce a set of recommendations for what needs to be done to deliver a DQ capability for IOT/WSNs.

    2. 3.4.2 Data Management Platform Architecture: Case StudiesManagement frameworks for IoT/WSN follow a few main themes: edge computing; dataintegration techniques; cloud computing; and data analytics. Badidi et al. (2018) identifykey features an urban data stream management and processing pipeline as: facilitatingreal-time event detection; notification of alerts; mining the opinions of citizens regardingCarrow Morris-Wiltshire June 27, 2024RADIAN – A Library for Scalable Quality-Aware Pedestrian Data Streams 16the governance of their city; and building monitoring dashboards. The authors implementa prototype of the using the Kafka messaging platform. Whilst there are a number ofsystems that have been developed for managing data streams, there are few that focuson data quality management that have been fully implemented. Ehrlinger et al. (2019)present a data quality management methodology that uses machine learning algorithmsto detect and correct data quality issues in real-time for industrial IoT applications, whichis the closest to a functioning management system for WSN data in the literature.

      You introduce some architecture material earlier. This again feels a little out of place to me. Would it be better to cover the methods entirely and then have a separate section on architecture and implementation digital infrastructure?

    3. Although data quality improvement is beyond the scope of this research, it is importantto be cognisant of these methods to inform development decisions for the quality-awaresystem. There a number of different methods for achieving this.3.4.1 Correcting ErrorsTeh et al. (2020) identify two categories of error correcting methods in WSN data, miss-ing data imputation which attempts to correct estimate sensor measurement values thatare missing and de-noising which aims to remove the noise associated with the measure-ment signal. The authors references a number of different methods for each of thesecategories, including: association rule mining (Gruenwald et al. 2007), clustering (Tanget al. 2015) where the authors use a hybrid model for for missing traffic volume estimation,k-nearest neighbour (Li & Parker 2014), and single-value decomposition (Xu et al. 2017)which the authors demonstrate on real-world air quality datasets.

      This feels that it is in the wrong place to me. It could either come at the end (before the summary/conclusion) or you could bring it right up front and that you are not addressing improvement but that being able to asses DQ is a first step towards knowing when and if you can correct errors.

    4. NOTE In this experiment the GRU performed best a window length of 60 data points andan output layer size of 32. The authors also found that using min-max normalisationcompared to z-score normalisation resulted in high misjudgement rate.

      In general not very keen on use of Notes. This insight should probably be in the main text. We can discuss when we meet.

    5. System architecture is an important component in detection and monitoring as it requiresa number of different processes running together. Mehmood et al. (2024) develop aportable hybrid architecture for smart cities based on device edge and cloud comput-ing. The hybrid system combines LSTMs, PageHinkley test, adaptive windowing, andKolmogorov-Smirnov windowing. Şimsek et al. (2024) integrate Transformers, CNNLSTM, GRU, and RFR models into their hybrid deep-learning detection system that usesfog computing, complex event processing, and virtualisation to do event detection. Whilstthese systems are not directly applicable to the research, they provide a good foundationfor understanding the architecture of a real-time monitoring system.

      This is good but depending on how you choose to structure the PhD you may loose a reader without explaining what a LSTM, CNN, GRU etc etc is.

    6. Table 2

      Having read ahead I can only really see two methods covered in the examples/text ... I was expecting a review (even just a short one) of the different approaches before you drop into the architecture considerations.

      Don't do anything on this before we next meet but we should discuss this and agree if more specific coverage of the methods is required.

    7. NOTE Monitoring is the continuous observation and collection of data, while detectionis the process of identifying specific events, anomalies, or patterns of interest withinthat monitored data. Monitoring provides visibility and context, while detection pinpointsspecific occurrences or conditions that require attention or action (Tuychiev 2023).

      This is important but should not be presented as a note, it should probably be integrated into the main text.

    8. Rule-based Threshold-based detection, finite statemachines, and expert systems.Simple to implement but may struggle withcomplex event patterns and adaptability.MachinelearningSupervised learning: decision trees, supportvector machines, naive Bayes, etc.Unsupervised learning: clustering, anomalydetection, etc.Can adapt to complex event patterns butrequire sufficient labelled data for training.Deep learning Convolutional neural networks CNNs),recurrent neural networks RNNs), Graphneural networks GNNs).Can handle high-dimensional andunstructured data but require large amountsof training data and computational resources.Statistical andprobabilisticHidden Markov models HMMs), Bayesiannetworks, and Gaussian mixture modelsGMMs).Can capture the uncertainties anddependencies in event occurrences but mayrequire prior knowledge of eventdistributions.Hybrid Combining rule-based and machine learningmethods or integrating deep learning withstatistical models.Can provide more robust and accurate eventdetection but may increase the complexity ofthe system.

      For the PhD write up you probably need to provide a more detailed critique of each approaches in terms of their success, advantages, limitations etc. but for now this is fine. For the paper we will need to take the table and probably convert to text and provide pointing references to the state of the art in each method.

    9. assessing of the quality of raw data as such, without consider-ing the context or the intended use of data.

      Should this be in " " same next statement?

    10. The intersection of data quality and data security issues canbe summarised as follows: solutions are needed that can scale to the massive numberof devices; resource constraints of devices limit applicability of existing techniques andcall for lightweight approaches; and enforcing policies (cross-industry standardisation)is important.

      Struggling to see how this defines the intersection between DQ and DS??

    11. Mansouri et al. (2023) provides a reliable and recent review of IoT data quality literature.Whilst there are limited significant changes to the dimensions proposed by Karkouch et al.(2016), the authors map key issues to the core dimensions from the Karkouch et al. (2016)framework, whilst highlighting how these issues arise from specific set of problems

      Found this quite hard to follow - maybe consider how it can be re-written to be clearer.

    12. T

      Love the table but maybe consider giving a real example here of the issues impacting a dimension and a single problem having multiple DQ issues.

    13. 3.1.2 Internet of Things Data Quality Taxonomy

      Ah - in relation to previous comment. Ok you actually do what I suggest in the next paragraph. Maybe consider not having this as a separate section (i.e., 3.1.2) but just have the text flow straight into the IoT paragraph - this will give the first bit on DQ context and linkage to IoT/WSNs.

    14. DQ taxonomies have developed significantly over the last few decades as technology hasdeveloped and the age of ‘big data’ and IoT has emerged. Although it is difficult to identifya single “seminal” paper on DQ, one of the most influential and widely cited papers onDQ dimensions is Wang & Strong (1996). The authors describe DQ as ‘data that are fitfor use by data consumers’. Their taxonomy presented in Figure 1 describes four maincategories of DQ dimensions: intrinsic, contextual, representational, accessibility.

      So how does this impact or help inform your understanding in relation to WSN pedestrian data. I think you maybe need a few sentences critique of these categories and their suitability in relation to IoT/WSNs.

    15. The review will be structured as follows. The first section will explore what is meantby data quality in the context of wireless sensor networks WSN and examine the es-tablished taxonomies for quantifying data quality. The second section will investigatecommon methodologies used to assess data quality in WSN. The third section, whichwill be the main focus of the forthcoming research, will investigate methods for detectingand monitoring data quality in real-time. The fourth section will briefly investigate themethods used to manage and improve data quality such as automated methods for miss-ing data-imputation and denoising. The fifth section, also brief, will explore the methodsused to improve the design of WSN architecture so as to improve future data quality fromthe source. The final section will explore the challenges and future directions in buildingquality-aware platforms for smart cities.3.1 Data Quality Dimensions and Metrics3.2 Data Quality Assessment3.3 Data Quality Detection and Monitoring3.4 Data Quality Management and Improvement3.5 Data Quality Prediction and Proactive Approaches3.6 Challenges and Future Directions

      This can be removed.

    16. Adopt/adapt an existing taxonomy for data quality dimensions in object-detectingwireless sensor networks.• Establish the key themes from the literature for real-time data quality managementand monitoring frameworks.• Discuss existing methods for quantifying data quality in IoT networks.Carrow Morris-Wiltshire June 27, 2024RADIAN – A Library for Scalable Quality-Aware Pedestrian Data Streams 3• Present existing case studies that have implemented urban data quality manage-ment and monitoring pipelines.

      Any references here that you can point to that support the assertion that these are the important aspects to consider?? You probably cite papers later that in their introductions state these as being important aspects to understand.

    17. This research aims to address some of the challenges in reducing human involvement bydeveloping a library that enables the creation of scalable and quality-aware pedestriandata stream pipelines. The library will provide a comprehensive framework for automateddata quality assessment, cleaning, and monitoring, empowering end-users to make in-formed decisions based on reliable and trustworthy sensor data. By ensuring data qual-ity throughout the pipeline, this research seeks to enhance the accuracy, reliability, andeffectiveness of automated decision systems in pedestrian activity monitoring.

      This is fine from the technical perspective BUT I would also like to see an articulation of the intellectual challenge that you are looking to address - so a stronger focus on understanding when, where and why data is or is not appropriate for a particular 'real-time' application in relation to better informed autonomous decision making in the built environment. Then state the technical objetcive of developing a library that aims to acheive this.

    18. .

      Good introduction. For the paper this (with a little editing) would suffice for the Introduction if you brought the objectives into it to state the challenge you are looking to address.

    19. , just as accurate sensory information is vital for living organisms to flour-ish

      I like the analogy with biological systems but be a bit careful not to use it to much!

    20. The importance of data quality in WSNs for smart cities cannot be overstated.

      Maybe better to say:

      "WSNs data quality is critically important in relation to the ability to utilise IoT to make better decisions regarding the dynamics of the built environment"