23 Matching Annotations
  1. Oct 2023
    1. Customers are often left to cobble together disparate services without tight integration in the way Microsoft might provide, for example.All this makes the introduction of Amazon Aurora zero-ETL integration with Amazon Redshift such a jaw-dropper. Let’s be clear: In essence, AWS announced that two of its services now work well together. It’s more than that, of course. Removing the cost and complexity of ETL is a great way to remove the need to build data pipelines. At heart, this is about making two AWS services work exceptionally well together. For another company, this might be considered table stakes, but for AWS, it’s relatively new and incredibly welcome.It’s also a sign of where AWS may be headed: tighter integration between its own services so that customers needn’t take on the undifferentiated heavy lifting of AWS service integration.
    1. One of the places where customers spend the most time building and managing ETL pipelines is between transactional databases and data warehouses, which is where AWS set its sights.
    1. One potential solution is the use of a “one big table” (OBT) strategy, where all the raw data is placed into one table. This strategy has both proponents and detractors, but leveraging large language models may overcome some of its challenges, such as discovery and pattern recognition. Super early startups such as Delphi and GetDot.AI, as well as more established players such as AWS QuickSite, Tableau Ask Data, and ThoughtSpot, are driving this trend.
    2. Snowflake and Databricks are pursuing “no copy data sharing,” which provides expanded access to the data where it’s stored without the need for ETL.
  2. May 2020
    1. AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics.

  3. Jan 2020
    1. I like that the Lambda Architecture emphasizes retaining the input data unchanged. I think the discipline of modeling data transformation as a series of materialized stages from an original input has a lot of merit. This is one of the things that makes large MapReduce workflows tractable, as it enables you to debug each stage independently. I think this lesson translates well to the stream processing domain. I’ve written some of my thoughts about capturing and transforming immutable data streams here.

      Great point 👍

      Something i've thought about and emphasized for doing FDF - ability to debug per step or re-run after a given step.

  4. May 2018
    1. hi there get the full insights on MSBI tools training and tutorial with the Real time Examples and application on the Running Projects as well https://www.youtube.com/watch?v=OzmdY0zCw4g

    2. hi there Check this MSBI Tools training and tutorial insights with the real time Examples and projects analysis on the MSBI

      https://www.youtube.com/watch?v=EdF9tZliIok

    3. hi there learn MSBI in 20 min with handwritten explanation on each and every topics on the Course with real time examples

      https://www.youtube.com/watch?v=tFG-VkaSvhI

    4. Get the proper Explanation on the ETL testing Tools training and Tutorial Course with better Real time exercises and understanding of Testing Processes on different stages from Extraction to Loading of data in client location

      so check this link for better learning:- https://www.youtube.com/watch?v=-vNgcOsHbIU

  5. May 2014
    1. Collaborate for God's sake!: EVERY organization dealing with data is dealing with these problems. And governments need to work together on this. This is where open source presents invaluable process lessons for government: working collaboratively, and in the open, can float all boats much higher than they currently are. Whether it's putting your scripts on GitHub, asking and answering questions on the Open Data StackExchange, or helping out others on the Socrata support forums, collaboration is a key lever for this government technology problem.

      Collaboration is clearly key, but it's not obvious what that means. The suggestion here is a good first step in an organization:

      • scripts on github
      • asking and answering questions on stackexchange
      • and (for data) joining the Socrata support forums

      What does it take to get organizations on this path?

      And what steps are next once the organization has evolved to this point?