Tag Archives: data governance

In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines on top of Spark and Alluxio

Abstract: Legacy enterprise architectures still rely on relational data warehouse and require moving and syncing with the so-called “Data Lake” where raw data is stored and periodically ingested into a distributed file system such as HDFS. Moreover, there are a … Continue reading

Posted in Agile, Big Data, Machine Learning, Open Source, Scala, Spark | Tagged , , , , , , , | Leave a comment

Logical Data Warehouse for Data Science: map raw data directly from source to Spark in-memory with Tachyon

Common problems for large organizations dealing with Big Data and Data Science applications are: Data stored in non scalable infrastructure for analysis and processing Data governance and security policies 1. Data often resides into central data warehouse and RDBMS of which many legacy applications … Continue reading

Posted in Agile, Big Data, Scala, Spark | Tagged , , , , , , , | Leave a comment