Quick tech specs
- Ideal for increasing the ventilation in all rack enclosure server cabinets
- 2A current draw
- Compatible SmartRack Enclosures: SR25UB/ SR25UBKD/ SR25UBSP1/ SR25UBTAA/ SR42UB/ SR42UBCL/ SR42UBEXP/ SR42UBG/ SR42UBKD/ SR42UBSP1/ SR42UBTAA/ SR45UB/ SR48UB/ SR48UBCL/ SR48UBEXP/ SR48UBSP1/ SR42UBTAA and other rack enclosures that specify the use of this
- Features (6) 120V fans mounted in a roof panel w/ a 6-ft cord
- Made of cold rolled steel w/ black finish
- Tripp Lite is now a part of Eaton
Know your gear
Roof Mount Fan Panel - 120V. Ideal for increasing the ventilation of SmartRacks. Features (6) 120V fans mounted in a roof panel, 6-ft. cord. Required mounting hardware included. Cold rolled steel with black finish.
Add to Compare
Enhance your purchase
Tripp Lite Rack Enclosure Cabinet Roof Mount Fan Panel Airflow Mgmt 120V is rated
4.40 out of
5 by
34.
Rated 5 out of
5 by
Bernd Stroehle. from
An open-source solution that has limitations in processing too many jobs
What needs improvement?
Apache Airflow improved workflow efficiency, but we had to find solutions for large workflows. For instance, a monthly workflow with 1200 jobs had to be split into three to four pieces as it struggled with large job numbers. Loading a workflow with 500 jobs could take 10 minutes, which wasn't acceptable.
The most important feature Apache Airflow lacks is support for external configuration files. All classical schedulers like Control-M or Automic allow you to load workflow definitions from YAML, XML, or JSON files, but the tool requires you to write Python programs. Airflow only supports external configuration for variables, not for workflows. To address this, I created a YAML configuration file that I converted into Python programs, but this functionality is missing from Apache Airflow itself.
All of its competitors have this feature. In Control-M, Automic, and IBM's scheduler, you can load workflows from XML, JSON, or YAML files.
For how long have I used the solution?
I've been familiar with Apache Airflow for about three to four years. I worked on a project at a leading German bank for two years, successfully migrating large applications with hundreds of jobs. However, the leading German bank paused its migration strategy due to issues with the team in India. They're likely waiting for version 3, which is expected next year.
What do I think about the stability of the solution?
I rate the tool's stability a nine out of ten.
What do I think about the scalability of the solution?
I rate the product's scalability a seven out of ten.
How are customer service and support?
Apache Airflow doesn't have its own technical support.
How was the initial setup?
I've been involved in all aspects of Airflow deployment, including building infrastructure using Kubernetes and containers. We faced challenges migrating from enterprise schedulers like Control-M and IBM's scheduler to Airflow, as it lacked some functionality. I had to implement extra features and extensions to support things like individual calendars.
What's my experience with pricing, setup cost, and licensing?
Apache Airflow is open-source and free. Hyperscalers like Google (with Composer), Azure, and AWS offer managed Airflow services.
What other advice do I have?
I recommend Apache Airflow because it's open-source, but you must accept its limitations. However, I wouldn't recommend it to companies in biomedical, chemistry, or oil and gas industries with large workflows and thousands of jobs. For example, genomic analysis at an American multinational pharmaceutical and biotechnology corporation involved workflows with around twenty thousand jobs, which Airflow can't handle. Special schedulers are needed for such cases, as even classical schedulers like Control-M and Automic aren't suitable.
I rate the overall solution a seven out of ten.
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2024-09-06T00:00:00-04:00
Rated 5 out of
5 by
robert74 from
Works as advertised
I have been using the SRFANROOF now for over 4 years. This product is extremely easy to use and fits the SR42UB, SR45UB and SR48UB. The stock top panel can be removed and replaced with the SRFANROOF in seconds. I have deployed nearly 400 of these devises.
Date published: 2013-02-28T00:00:00-05:00
Rated 5 out of
5 by
reviewer1447245 from
Integrates well with other pipelines and builds different processes well but the scalability needs improvement
What is our primary use case?
We normally use the solution for creating a specific flow for data transformation. We have several pipelines that we use and due to the fact that they're pretty well-defined, we use it in conjunction with other tools that do the mediation portion. With Airflow, we do the processing of such data.
What is most valuable?
The product integrates well with other pipelines and solutions.
The ease of building different processes is very valuable to us. The difference between Kafka and Airflow, is that it's better for dealing with the specific flows that we want to do some transformation. It's very easy to create flows.
What needs improvement?
The graphics in the past have not been ideal.
We have several areas where we feel they could improve in terms of being a little bit more flexible. One is implementation. Even though we customized it, there were some specific things we had to do with the image by itself.
The management integration was challenging as well. It requires a lot of work on our end. We were creating our own way to integrate things specifically with specific tools. There's not really an ease of management out-of-the-box option for integration. We needed to become a little bit creative to solve that ourselves.
The scalability of the solution itself is not as we expected. Being on the cloud, it should be easy to scale, however, it's not.
There is no SDC versioning. There's no virtual control for pipelines. We have to build several pipelines for several flows, yet there's not a virtual control to generate them.
There's no Python SDK. We need to generate our own scripts and upload them and put them there. However, there's not a realistic case that we can get connected to them. On top of that, the API sets that are provided are very limited. They are not as rich as others. You cannot do much with them.
For how long have I used the solution?
I've been using the solution for maybe three years at this point. It hasn't been too long.
What do I think about the stability of the solution?
The solution is largely stable. Obviously when you start creating more use cases, then you realize the limitations, however, it's not really, really bad.
What do I think about the scalability of the solution?
Due to the fact that the solution is on the cloud, we thought it would be fairly easy to scale. This is proving not to be the case and scalability is limited.
The challenging part is to make it really flexible in a cloud-native environment. With other applications, what you have there is the scalability that can be sensitive to your needs, based on the amount of data you are putting into the flow.
Instead of you having to create your own logic to scale it up, it should be a little more efficient on how it gets integrated into the whole environment. You have to get a little bit creative and put some commands and some logic in there and be monitoring everything. You build everything - versus other options that are more out of the box. With other solutions, if you have these bursts of data they ultimately can scale up and they are more native.
How are customer service and technical support?
Technical support has been pretty good. We don't really have anything to complain about. We're satisfied with the service so far.
Which solution did I use previously and why did I switch?
For this particular category, due to the fact that we're testing all the other tools and they were too much of what we needed and due to the fact that we have used other products in other projects, and nothing really worked for us. Airflow, being a bit different, we decided that it was a nice player and a good open-source tool.
We do use other tools. However, this one seems to work quite well for us.
How was the initial setup?
The initial setup isn't as straightforward as we hoped. It's not as flexible as other options. You need to be a bit creative during the process.
What's my experience with pricing, setup cost, and licensing?
This product is open-source.
What other advice do I have?
We're just customers and end-users. We don't have a special business relationship with Apache.
I'm not sure of which version of the solution we're using. It's likely the most up-to-date, or at the very most back two or three versions as we are not using any of the older versions.
I'd advise others considering the solution to first understand what exactly you're trying to achieve. You either select a non-cloud native Apache workflow manager or select something that is way too big for what you are actually trying to achieve. Understand what is exactly what you need and the volumes that you need, and what exactly are the use cases.
After that, in terms of deployment, that depends on what you exactly are trying to do. If all of your solutions are cloud-native, try to do it with a cloud-native tools solution. Specifically, go to the CMCS site and look into the solutions that there. Those have been tested at least for the cloud-native solutions that exist.
Then, just make sure that the components you have will match and will be available to whatever you're trying to build. For example, the user management is something that is important for us and for this specific setup. Probably for some others, it's not going to be.
Take into consideration, what are the different connection points and make sure that they are either supported or that you can support the integration of such items. You need to have a proper developer that can help you build your connector or your API.
In general, I would rate the solution at a seven out of ten. If they fix the APIs and the price on LTK, I'd rate it closer to a nine.
Which deployment model are you using for this solution?
Public Cloud
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2020-12-23T00:00:00-05:00
Rated 5 out of
5 by
reviewer1502001 from
Comes with direct support for Python, letting us easily automate our pipelines
What is our primary use case?
There are a few use cases we have for Apache Airflow, one being government projects where we perform data operations on a monthly basis. For example, we'll collect data from various agencies, harmonize the data, and then produce a dashboard. In general, it's a BI use case, but focusing on social economy.
We concentrate mainly on BI, and because my team members have strong technical backgrounds we often fall back to using open source tools like Airflow and our own coded solutions.
For a single project, we will typically have three of us working on Airflow at a time. This includes two data engineers and a system administrator. Our infrastructure model is hybrid, based both in the cloud and on-premises.
What is most valuable?
The best part of Airflow is its direct support for Python, especially because Python is so important for data science, engineering, and design. This makes the programmatic aspect of our work easy for us, and it means we can automate a lot.
It's such a natural fit because our engineers are also Python-based, and I think we also quite like that we don't have to learn different kinds of UIs. Airflow is based on standard software packages, so we don't have to learn anything new in the way of opinionated UIs from different vendors.
What needs improvement?
We're currently using version 1.10, but I understand that there's a lot of improvements in version 2. In the earlier version that we're using, we sometimes have problems with maintenance complexity. Actually using Airflow is okay, but maintaining it has been difficult.
When something fails, it's not that easy to troubleshoot what went wrong. Sometimes the UI becomes really slow and there's no easy way to diagnose the problem. For the most part, we have had to learn through trial and error how to operate it properly.
The UI is also not that attractive, and I feel that the user experience isn't that nice. Version 2 is supposedly better, but without having tried it, I could suggest more improvements in the visual UI. We want to do the ETL as code, but having a nice visual UI to facilitate this process would be great. Because that means we can also rely on non-technical staff, rather than just the three solid technical staff we have here. If there were better features for the UI, like drag-and-drop, then we could expand its use to more of our team.
For how long have I used the solution?
I've been using Apache Airflow for about two and a half years.
What do I think about the stability of the solution?
I think how Apache Airflow works is great. We like the paradigm of ETL as code, which means you define your pipeline as code. All the while, people talk about infrastructure as code, so the practice of ETL as code really fits into that philosophy.
What do I think about the scalability of the solution?
We can scale it well, and it runs on cloud, too. It's compatible with cloud-native technologies like Kubernetes so it has no issues regarding elasticity.
How are customer service and technical support?
We contacted an Airflow developer for assistance once and it was a good experience.
Which solution did I use previously and why did I switch?
We like to explore different tools, mixing and matching them to our needs, but we have never really found any like Airflow that are to our liking. We tried looking into Talend and Alteryx but we didn't find them suitable to our style or approach.
How was the initial setup?
As a first-time user, it was complex and somewhat difficult to set up as there are many components to put together. You've got your data portion, your scheduler portion, your web server portion, etc., and you've got all these parts to set up at first.
The next project that you get to, it gets easier. You really need to acquire a feel for what you're doing, and once you get over that, it's not too bad.
What about the implementation team?
We implemented Airflow ourselves, with the help of our two in-house data engineers and system administrator. It took around three months to get it deployed initially, from concept into production. Then after that, the goal is just to operate it and keep it running.
What's my experience with pricing, setup cost, and licensing?
Although Airflow is open source software, there's also commercial support for it by Astronomer. We personally don't use the commercial support, but it's always an option if you don't mind the extra cost.
What other advice do I have?
I can recommend Apache Airflow, especially if there are serious data engineers on your team. If, on the other hand, you're looking to enable business users, then it's not suitable.
I would rate Apache Airflow an eight out of ten.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Other
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2021-03-01T00:00:00-05:00
Rated 5 out of
5 by
reviewer1539081 from
Feature rich, open-source, and good for building data pipelines
What is our primary use case?
I'm a data engineer. In the past, I used Airflow for building data pipelines and to populate data warehouses. With my current company, it's a data product or datasets that we sell to biopharma companies.
We are using those pipelines to generate those datasets.
What is most valuable?
I like the UI rework, it's much easier.
I use XCom for derived variables that need to pass between tasks. I don't really tend to use it for passing data, but only for a derived variable. For example, I don't have to re-query something every time, with one-task uses. I use the JSON comp for overwriting certain parameters.
In our use cases, some of the inputs of the dataset are files that we pulled out of S3. Sometimes they need to re-do those files, but we don't need to change any logic, we just need to redo the bills. Rather than redeploying the code to point to a new S3 bucket, we overwrite it to point to a different S3 key.
I have read that there are many different workflow pipelining tools in the biotech space, such as Snakemake and Nextflow.
There is also a CWL plugin that we may look into at some point.
Eventually, we might have a use case where a researcher has a pipeline they run locally, and then we want to convert that to a DAG.
The CWL-Airflow plugin would be useful for that. This might be something to look into later. But that would be like months, or maybe a year from now.
What needs improvement?
I am using a Celery Executor and I find that it crashes and I can't see any logs. I can only assume that it's a memory issue and have to blindly restart until eventually, it starts up again.
One of the use cases is triggered by input rather than a batch process. For example, we receive a batch of data, it goes through tasks one, two, and three, and a new batch comes in, each subsequent task should be operating on just that data from the prior task.
I am used to working on it as the output gets written to a table and then the next task selects all from that upstream table. It could be coded where you are only writing the data for that portion of the task. It could handle state machines and state changes as opposed to the batch proxy.
I would like to see it more friendly for other use cases.
For how long have I used the solution?
In my current company, I just introduced it within the last couple of months. But I've used it at my prior two jobs as well.
We are using Version 2.0.1.
What's my experience with pricing, setup cost, and licensing?
We are using the open-source version of Apache Airflow.
What other advice do I have?
I usually create my own custom operators every time. We upgraded to 2.0, but I am not using any of the new features.
I haven't yet used DAG of DAGs or the new way of using Python functions in the Python operator yet. But we might use DAG of DAGs eventually.
I Love this solution and I would rate it a nine out of ten.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2021-03-31T00:00:00-04:00
Rated 5 out of
5 by
reviewer1468407 from
Scalable, stable and simple installation
What is our primary use case?
We mainly used the solution in banking, finance, and insurance. We are looking for some opportunities in production companies, but this is only at the very early stages.
What is most valuable?
I do not have specific feedback because it is quite early in the review stage for comment.
What needs improvement?
The dashboard is connected into the BPM flow that could be improved.
For how long have I used the solution?
I have been using the solution for half a year.
What do I think about the stability of the solution?
We have been quite satisfied with the stability of the solution.
What do I think about the scalability of the solution?
The scalability of the solution is good.
How are customer service and technical support?
We had no issue with technical support.
How was the initial setup?
The installation is straightforward.
What's my experience with pricing, setup cost, and licensing?
The pricing for the product is reasonable.
Which other solutions did I evaluate?
We are evaluating Camunda as well as this solution. We are investigating and trying to determine how suitable they are for production facilities. Additionally, we are seeing where the solutions are actually suitable in what type of processes.
What other advice do I have?
We are unsure of which solution we will end up with, we are testing them currently. We are trying to get into new business types and new industries. We are looking into how well the solutions can be used in production facilities.
I rate Apache Airflow an eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2021-01-18T00:00:00-05:00
Rated 5 out of
5 by
Ashok Juyal from
Quick and easy to set up, but the technical support needs to be improved
What is our primary use case?
Our primary use case is to integrate with SLAs.
What is most valuable?
The most valuable feature is the workflow.
What needs improvement?
Technical support is an area that needs improvement. The contact numbers should be readily available so that we can call to get support as required.
In the future, I would like to see a single-click installation.
For how long have I used the solution?
We have been working with Apache Airflow for approximately one month.
What do I think about the scalability of the solution?
In our company, we are doing a POC and there are only three users. We have also implemented it for clients.
We do plan to increase our usage and the POC that we are now working on is something that we will implement for other clients if it works.
How are customer service and technical support?
We are not satisfied with technical support. We rely on using Google to identify solutions for the problems we have.
Which solution did I use previously and why did I switch?
We did not use another similar solution prior to Airflow.
How was the initial setup?
The initial setup was straightforward and it does not take long to complete. The deployment took no more than an hour.
Which other solutions did I evaluate?
We evaluated Control-M and another similar product from IBM.
What other advice do I have?
This is a good product and I definitely recommend it.
I would rate this solution a seven out of ten.
Which deployment model are you using for this solution?
On-premises
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2020-12-27T00:00:00-05:00
Rated 5 out of
5 by
Ariful Mondal from
Managing large scale Data Pipeline and Python tasks have been made easy
We have been using Apache Airflow for the past 2 years for various use cases such as:
* Data Pipeline building and monitoring
* Automation of data extraction processes and Intelligent Automation
* Web Scraping at scale for financial services
We manage large-scale data processing workloads using DAG (Directed Acyclic Graph), which is a core concept of Airflow (Apache Airflow is commonly known as Airflow) expediting error handling and logging. It helped us to manage the complex workflows and orchestration of tasks efficiently.
I found the following features very useful:
* DAG - Workload management and orchestration of tasks using
* TaskFlow API - moving Python tasks have been made easy, cleaning of DAGs using @task decorator in python
* Connection and Hooks - interface to connect external systems
To be able to implement various useful functionalities of Airflow effectively you would need to be a very good python programmer. UI can be improved with additional user-friendly features for non-programmers and for fewer coding practitioner requirements.
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2021-10-17T00:00:00-04:00
Rated 5 out of
5 by
Ravan Nannapaneni from
Beneficial creating and scheduling jobs, but stability need improvement
What is our primary use case?
Apache Airflow is utilized for automating data engineering tasks. When creating a sequence of tasks, Airflow can assist in automating them.
What is most valuable?
The most valuable feature of Apache Airflow is creating and scheduling jobs. Additionally, the reattempt at failed jobs is useful.
What needs improvement?
We have faced scenarios where Apache Airflow becomes non-responsive, leading to job failures. To resolve such situations, we had to manually reboot Apache Airflow since it doesn't provide an option to restart within the application. This necessitated modifying some configurations to initiate a restart of all Apache Airflow components. Although Apache Airflow is generally dependable, it may occasionally encounter glitches that can disrupt production flows and batches.
For how long have I used the solution?
I have been using Apache Airflow for approximately three years.
What do I think about the stability of the solution?
We experienced some glitches using the solution with some errors.
I rate the stability of Apache Airflow a five out of ten.
What do I think about the scalability of the solution?
Apache Airflow is scalable because it is within Amazon AWS.
I rate the scalability of Apache Airflow an eight out of ten.
How are customer service and support?
The technical support is good, they are able to debug issues.
How was the initial setup?
The initial setup of Apache Airflow was simple because it was all managed by Amazon AWS. The process took a few minutes.
What's my experience with pricing, setup cost, and licensing?
The solution is free if you use Amazon AWS.
What other advice do I have?
I would recommend this solution for projects even though there have been glitches. Once the solution has become stable it would be ideal for critical projects.
I rate Apache Airflow a seven out of ten.
This is great software to build data pipelines. However, we had many glitches that were causing some problems in production.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2023-04-05T00:00:00-04:00
Rated 5 out of
5 by
reviewer1715364 from
A tool that needs to improve its complex initial setup and limited integration capabilities but can be useful in workflow automation
What is our primary use case?
Apache Airflow is useful for workflow automation, making it capable of automating pipelines, data pipelines, and data warehouse processes. I don't have a strong need for Apache Airflow because I do everything with a dbt or data build tool since it has its own integrated workflow process.
I use Fivetran to synchronize my data. I don't need to do any automation on that and don't have any need for workflow automation. I have everything I need.
How has it helped my organization?
We were experimenting with the solution. We never reached the point where we would deploy the solution in the production capacity.
What needs improvement?
The problem with Apache Airflow is that it is an open-source tool. You have to build it into a Kubernetes container, which is not easy to maintain, and I find it to be very clunky.
Additionally, there is room for improvement with DAGs. I had a very hard time building DAGs in Apache Airflow. I decided to use Astronomer, which is on top of Apache Airflow and is supposed to make your life easier. The best part of the solution is the third-party add-on which is Astronomer.
It would be a very nice tool if it could have been an entirely cloud-based solution. Apache Airflow is not so nice when you have a hybrid setup, such as half is on-premises and half of it is on a cloud environment. It should integrate better with the outside world.
For how long have I used the solution?
I have been using Apache Airflow for a couple of months.
What do I think about the stability of the solution?
I have no opinion on the solution's stability. The solution did not get to a production capacity. I couldn't even do file processing with Apache Airflow. None of the engineers could actually help me set up Apache Airflow. I had to give up on the product. Just buy a product that works, and you will be done with it.
How was the initial setup?
The initial setup was complex to deploy on the cloud. Installing the software is very difficult. The documentation is very bad. There is no installer where you can press a button, and it does everything for you. One may need a couple of engineers to install the solution, which is an issue with open-source tools. Price-wise, the software falls on the cheaper side. With Apache Airflow, one may spend much more on engineers.
The solution is deployed purely on the cloud.
What was our ROI?
I didn't experience any ROI using the solution. I could do everything without Apache Airflow since it would have been just a money pit.
What other advice do I have?
I suggest others not use Apache Airflow. If you use Apache Airflow, you will waste your time unless you have a bunch of engineers who already know about the solution.
If you cannot write a DAG within two hours of starting the process, then forget about the tool, and it would be better if you tried to find something else.
Overall, if the tool was working properly, it would be very good, but unfortunately, it is not.
Overall, I rate the solution a five out of ten.
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2023-07-18T00:00:00-04:00
Rated 5 out of
5 by
ABHISHEKTIWARI2 from
A good tool for managing data pipelines
What is our primary use case?
We use Apache Airflow to send our data to a third-party system.
What is most valuable?
We are already on Python. Since Apache works very well on Python, we can manage everything and create pipelines there.
What needs improvement?
Adding more automated components in Apache Airflow for basic things like exporting the data would be helpful. Apache Airflow is not that easy to use, but we have gotten used to it.
For how long have I used the solution?
I have been using Apache Airflow for three years.
What do I think about the stability of the solution?
Apache Airflow is a stable solution.
What do I think about the scalability of the solution?
Apache Airflow is not a scalable solution for our use cases. We have a very huge list of use cases. Over 10 developers use Apache Airflow in our organization.
How are customer service and support?
Apache Airflow's technical support team is good and provides assistance almost 90% of the time.
How was the initial setup?
Apache Airflow's initial setup is easy. It's not that difficult, but it has a learning curve.
What's my experience with pricing, setup cost, and licensing?
Apache Airflow is a cheap solution.
What other advice do I have?
Depending on your use case, if you are looking for a quick solution to work on and know Python, you should go ahead with Apache Airflow.
Apache Airflow is a good enough tool for managing data pipelines. However, the solution is not up to the mark as you scale up and go at the higher performance. Apache Airflow has introduced the DAG connector for managing data pipelines.
Overall, I rate Apache Airflow an eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2023-07-12T00:00:00-04:00
Rated 5 out of
5 by
Mohd Zaid Waqiyuddin Mohd Zulkifli from
A cost-effective solution widely adopted and with a broad open-source community
What is most valuable?
Since the solution is programmatic, it allows users to define pipelines in code rather than drag and drop.
What needs improvement?
There is an area for improvement in onboarding new people. They should make it simple for newcomers. Else, we have to put a senior engineer to operate it.
For how long have I used the solution?
I have been using Apache Airflow for five years. We are using the latest version of the solution.
What do I think about the stability of the solution?
I rate the solution’s stability a seven out of ten.
What do I think about the scalability of the solution?
The scalability is good. We have five people working on it for five different projects.
I rate the solution’s scalability a ten out of ten.
Which solution did I use previously and why did I switch?
We have used open-source Apache NiFi for data flow, Talend, and secret server integration services.
We chose Apache Airflow because it is quite popular, adopted by many people, and has an open-source community and engineers. We moved with the crowd and chose it based on popularity.
How was the initial setup?
I rate the initial setup five on a scale of one to ten, one being difficult and ten being easy.
The deployment required a senior engineer and took a week to complete.
What's my experience with pricing, setup cost, and licensing?
The solution is cheap.
What other advice do I have?
A new user has to be prepared to adopt a new paradigm and treat the data baseline as code rather than drag and drop. An organization should have a dedicated or experienced person looking into this.
Overall, I rate the solution an eight out of ten.
Which deployment model are you using for this solution?
On-premises
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2023-07-31T00:00:00-04:00
Rated 5 out of
5 by
SUDHIR KUMAR RATHLAVATH from
Enable seamless integration with various connectivity and integrated services, including BigQuery and Python operators
What is our primary use case?
Apache Airflow is like a freeway. Just as a freeway allows cars to travel quickly and efficiently from one point to another, Apache Airflow allows data engineers to orchestrate their workflows in a similarly efficient way.
There are a lot of scheduling tools in the market, but Apache Airflow has taken over everything. With the help of airflow operators, any task required for day-to-day data engineering work becomes possible. It manages the entire lifecycle of data engineering workflows.
How has it helped my organization?
So, for example, let's say you want to connect to multiple sources, extract data, run your pipeline, and trigger your pipeline through other integrated services. You also want to do this at a specific time.
You have a number of operators that can help you with this. For example, you can use the External Sensor operator to take the dependency of workflows. This means that you can wait for one workflow to complete before triggering another workflow. There are also good operators like the Python operator and the Bash operator. These operators allow you to run your scripts without having to change your code. This is great for traditional projects that are already running on batch.
So, let's say you have a scheduled alert called Informatica. You can use Airflow to trigger your VTech scripts through the Informatica jobs. This way, you don't need to use Informatica as a scheduling engine. That's a good way to decouple your data pipelines from Informatica. Airflow is a powerful tool that can help you to automate your workflows and improve your efficiency.
What is most valuable?
Every feature in Apache Airflow is valuable. The number of operators and features I've used are mainly related to connectivity services and integrated services because I primarily work with GCP. Specifically, I've utilized the BigQuery connectors and operators, as well as Python operators and other runnable operators like Bash. These common operators have been quite useful in my work.
Another thing that stands out is its ease of use for developers familiar with Python. They can simply write their code and set up their environment to run the code through the scheduling engine. It's quite convenient, especially for those in the data engineering field who are already well-versed in Python. They don't need to install any additional tools or perform complex environment setups. It's straightforward for them.
The graphical interface is good because it runs on a DAG (Directed Acyclic Graph).
What needs improvement?
One improvement could be the inclusion of a plugin with a drag-and-drop feature. This graphical feature would be beneficial when dealing with connectivity and integration services like connecting to BigQuery or other systems. As a first-time user, although the documentation is available, it would be more user-friendly to have a drag-and-drop interface within the portal. Users could simply drag and drop components to create a pseudo-code, making it more flexible and intuitive.
Therefore, I suggest having a drag-and-drop feature for a more user-friendly experience and better code management.
Moreover, for admins, there should be improved logging capabilities because Apache Airflow does have logging, but it's limited to some database data. It would be better if everything goes into the server where it's hosted. Probably on the interface level. If something goes well for the developers.
For how long have I used the solution?
I have been using Apache Airflow since 2014. So, it's been over eight years. We currently use the latest version, 2.4.0
What do I think about the stability of the solution?
Performance-wise, it's good because I've been using two versions - 2.0 and 2.4.
So, it's stable. The version we've been using is much more stable.
What do I think about the scalability of the solution?
It's pretty scalable.
How was the initial setup?
The initial setup is easy if it's on the cloud, you get everything - scalability, usability, so you don't need to worry about storage. It's pretty scalable.
What was our ROI?
The ROI is very high. Most companies are adopting Apache Airflow, and it can be used for a wide variety of tasks, including pulling data, summarizing tables, and generating reports. Everything can be done in Python and integrated within Apache Airflow. The efficiency and ease of use it offers contribute to its high ROI.
Which other solutions did I evaluate?
I've been using Apache Airflow, but I haven't directly compared it with other scheduling tools available in the market. This is because each cloud platform has its own built-in scheduling tool. For instance, if we consider Azure, it has a service called Azure Data Factory, which serves as a scheduling engine.
When you compare Apache Airflow with services like Azure Data Factory and AWS, you'll find that Airflow excels in various aspects. However, one area that could be improved is the integration with hosting services. Currently, Airflow can be hosted on different platforms and machines, which offers flexibility but may require some enhancements to streamline integration with certain hosting services.
What other advice do I have?
Since I have been using Apache Airflow for six to seven years, I would confidently rate the solution a solid ten. We help customers re-design and implement projects using Apache Airflow, and approximately 90% of our work revolves around this powerful tool. So, I rate this product a perfect ten.
Which deployment model are you using for this solution?
Public Cloud
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2023-07-31T00:00:00-04:00
Rated 5 out of
5 by
UjjwalGupta from
User-friendly, provides a graphical representation of the whole flow, and the user interface is pretty good
What is our primary use case?
The main use case is orchestration. We use it to schedule our jobs.
What is most valuable?
The best thing about the product is its UI. The tool is user-friendly. We can divide our work into different tasks and groups. It gives a graphical representation of the whole flow. It also creates a graph of the complete pipeline. The UI is beautiful. Whenever there is a failure, we can see it at the backend. We can retry at the point where the failure happened. We do not have to redo the whole flow. The user interface is pretty good. It provides details about the jobs. It also provides monitoring features. We can see the metrics and the history of the runs. The administration features are good. We can manage the users.
What needs improvement?
The solution lacks certain features. We cannot run real-time jobs in the solution. It supports only batch jobs. If we are using ETL pipelines, it can either be a batch job or a real-time job. Real-time jobs run continuously. They are not scheduled. Apache Airflow is for scheduled jobs, not real-time jobs. It would be a good improvement if the solution could run real-time jobs. Many connectors are available in the product, but some are still missing. We have to build a custom connector if it is not available. The solution must have more in-built connectors for improved functionality.
For how long have I used the solution?
I have been using the solution for four to five years.
What do I think about the stability of the solution?
The tool has stability issues that are present in open-source products. It has some failures or bugs sometimes. It is difficult to troubleshoot because we do not have any support for it. We have to search the community to get answers. It would be good if there were a support team for the tool.
What do I think about the scalability of the solution?
We have 5000 to 10,000 users in our organization.
How was the initial setup?
The installation is relatively easy. It doesn't have much configuration. It is straightforward. Some companies provide custom installations. It is easier, but it will be a costly paid service. We generally use the core product. We also have AWS Managed Services. It is a better option if we do not want to do the configuration ourselves.
What other advice do I have?
Apache Airflow is a better option for batch jobs. My advice depends on the tools people use and the jobs they schedule. Databricks has its own scheduler. If someone is using Databricks, a separate tool for scheduling would be useless. They can schedule the jobs through Databricks.
Apache Airflow is a good option if someone is not using third-party tools to run the jobs. When we use APIs to get data or get data from RDBMS systems, we can use Apache Airflow. If we use third-party vendors, using the in-built scheduler is better than Apache Airflow. Overall, I rate the solution a nine out of ten.
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2024-03-19T00:00:00-04:00
Rated 5 out of
5 by
Youngin Son from
Convenient, easy to learn, has a simple UI, and has a huge user base
What is our primary use case?
My team works on commerce services. We use Airflow to synchronize user information or product information from other services. We use the tool for automating data pipelines. We store user history about API calls and show it on a statistics page, like daily or real-time statistics. We use the solution to aggregate API user's data.
What is most valuable?
Kubernetes from the batch application is the most useful to my team. It uses Python. It is simple. There are not many learning costs. We're using the scheduler. We don't need to care about the batch job every day. We just need to notice when the alerts are firing. It is convenient for us. The product supports many other services, like Kubernetes. I saw some custom applications and programs. The solution integrates very well with other products.
What needs improvement?
The documents do not precisely define the function of the operators. I had to do some experiments to understand the function of the operators. The documentation must be improved. Some parts of the documentation do not precisely explain the parameters and functions. We often need to do experiments to understand how they work.
For how long have I used the solution?
I have been using the solution for one and a half years.
What do I think about the stability of the solution?
I rate the tool’s stability a nine out of ten.
What do I think about the scalability of the solution?
I rate the tool’s scalability a six or seven out of ten. We haven’t horizontally scaled the solution. At least 20% of the teams in my organization are using Airflow to do some batch jobs. There are around 300 users.
How was the initial setup?
I rate the ease of setup an eight out of ten. The product is deployed on the cloud. We release Airflow on Kubernetes. The deployment takes less than five minutes. We use a deployment tool made by our company to deploy the solution.
Which other solutions did I evaluate?
I am also using Apache Kafka.
What other advice do I have?
I will recommend the product to others. The UI is very simple and easy to learn. There are a lot of users of the product. We can find information easily on Google. Overall, I rate the tool an eight out of ten.
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2024-03-04T00:00:00-05:00
Rated 5 out of
5 by
ManojKumar43 from
A solution for orchestrating EMR clusters with plug-and-play UI
What is our primary use case?
I have used Apache Airflow for various purposes, such as orchestrating Spark jobs, EMR clusters, Glue jobs, and submitting jobs within the DCP data flow on Azure Databricks including ad hoc queries. For instance, if there's a need to execute queries on Redshift or other databases.
How has it helped my organization?
If you are working with APIs or databases, you must write SQL queries and formulate the right statements to retrieve everything. But with the UI, it's more like plug-and-play. You go there, select the task you want to see, like logs, and click on it. It will promptly display the details of the logs, automatically showing the returned logs. However, if you're accessing logs manually from the web server, you must write commands and perform additional tasks. These overheads can be efficiently managed using the UI.
What is most valuable?
Apache Airflow is easy to use and can monitor task execution easily. For instance, when performing setup tasks, you can conveniently view the logs without delving into the job details. All logs are readily accessible within the interface itself. Examining the logs lets you discern which steps and processes are being executed.
You don't have to configure SMTP for everything. You need to configure email settings, such as email on error, failure, or alert access. With Apache Airflow, you can send emails with just a few lines of code. You don't have to write extensive code to configure SMTP; all those configurations can be accomplished within a few lines of code.
I managed a complex workflow for a finance application project. They use Apache Airflow to orchestrate processes, such as retrieving data from SFTP and landing it into S3. From S3, they trigger Glue jobs based on certain conditions. Additionally, they use the Glue catalog in Glusoft for data management, all orchestrated using Airflow. Furthermore, various logics are written in Airflow DAGs to handle scenarios like security mismatches. For instance, files are sent accordingly if there's a missing security.
Apache Airflow triggers a set of tasks based on DAGs. If you have multiple tags, such as raw, transform, and ready layers, instead of manually triggering each DAGs. In that case, you can integrate them to trigger one, automatically triggering the others. Also, you can put conditions.
What needs improvement?
Airflow should support the dynamic drag creation.
For how long have I used the solution?
I have been using Apache Airflow for over 8 years.
What do I think about the stability of the solution?
The solution is stable.
I rate the solution's stability a nine-point five out of ten.
What do I think about the scalability of the solution?
We were using Apache Airflow on Kubernetes. As more requests came in, it scaled dynamically based on the available ports. There are almost 15 data engineers who are using Apache Airflow.
I rate the solution's scalability a nine out of ten.
How was the initial setup?
The initial setup is straightforward. It will be tricky if you go with an executor or Kubernetes operator.
If you're into plug-and-play convenience, Apache Airflow supports various deployment methods like Docker, Helm, or Kubernetes. If you want to spin up Airflow, it will take more than 10-15 minutes. However, if you're making customizations or prefer not to use existing databases, the setup time could be extended due to customization requests.
What other advice do I have?
You use Apache Airflow to automate your data pipelines. When you have a data pipeline, such as a Spark job or any other job, and want to automate it, triggering the job manually is not always necessary. You need to configure these DAGs accordingly. For instance, Airflow can initiate the job when the data becomes available. We don't need to keep the cluster running all the time, 24/7. We start the cluster using Airflow when we need to submit the job. Once the job is completed, we terminate the cluster.
I recommend the solution.
Overall, I rate the solution a nine out of ten.
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2024-04-10T00:00:00-04:00
Rated 5 out of
5 by
Pravin Gadekar from
Has an efficient user interface, but its stability needs improvement
What is our primary use case?
We use the product to orchestrate data engines and process new data files.
What is most valuable?
The product's most valuable feature is scalability. It helps us run hundreds of data jobs every day.
What needs improvement?
The platform's stability needs improvement, particularly regarding occasional interruptions due to networking issues. It requires manual intervention to resume jobs. Additionally, while extending the code is possible, it sometimes necessitates creating custom plugins.
For how long have I used the solution?
We have been using Apache Airflow for four years.
What do I think about the scalability of the solution?
We have more than 100 Apache Airflow users in our organization.
How was the initial setup?
The initial setup on Google Cloud using Cloud Composer is straightforward and simplified. However, deploying it on-premises can be complex and challenging.
What was our ROI?
The product is worth the investment.
What's my experience with pricing, setup cost, and licensing?
It is an open-source solution, so there are no hidden fees or licensing costs associated with the software. However, users need to cover the operational costs for the actual infrastructure, such as the virtual machines (VMs).
What other advice do I have?
The directed acyclic graph (DAG) functionality in Apache Airflow has significantly enhanced our workflow management. It provides a visual representation of data processing tasks.
The user interface for monitoring and managing workflows has been excellent, particularly in the latest version. It is difficult for beginners to use the platform, and some training is required.
I recommend the product to others, and it is much better than our competitors. It is an open source. I rate it a seven out of ten.
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2024-04-01T00:00:00-04:00
Rated 5 out of
5 by
Mikalai Surta from
Used for the orchestration of data pipelines, but it should have better integration with cloud platforms
What is our primary use case?
We use Apache Airflow for the orchestration of data pipelines.
What is most valuable?
Since it's widely adopted by the community, Apache Airflow is a user-friendly solution.
What needs improvement?
Apache Airflow should have better integration with cloud platforms.
For how long have I used the solution?
I have been using Apache Airflow for a couple of years.
What do I think about the stability of the solution?
Apache Airflow is not a stable solution.
What do I think about the scalability of the solution?
Around ten people are using the solution in our organization.
How was the initial setup?
The solution's initial setup is difficult and should be done by an experienced person.
What's my experience with pricing, setup cost, and licensing?
Apache Airflow is a cheap solution.
What other advice do I have?
The solution is deployed on the cloud in our organization. Before choosing Apache Airflow, users should try cloud-native services first.
Overall, I rate the solution a seven out of ten.
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2024-02-24T00:00:00-05:00
Rated 5 out of
5 by
reviewer1619292 from
Helps to schedule data pipelines but improvement is needed in workflow integration across the servers
What is our primary use case?
We use the tool to schedule data pipelines. We also use Apache Airflow to orchestrate dbt, another data processing tool. Airflow helps manage dbt processes, which, in our case, load data from our data lake.
What is most valuable?
To increase efficiency, it's quite simple to add dbt tasks to an Apache Airflow pipeline or orchestration file. With the tool, you can specify dependencies.
What needs improvement?
I would like to see workflow integration across the servers.
For how long have I used the solution?
I have been using the product for two years.
What do I think about the stability of the solution?
The solution is stable, but we do have occasional performance issues. These aren't performance problems, but the Apache Airflow cluster sometimes crashes when too many tasks run simultaneously.
What do I think about the scalability of the solution?
My team has around 11 people using the tool. Each team has a separate server, so we have about 10-20 different Apache Airflow servers. Altogether, I would estimate that around 200 people in our organization use it.
How are customer service and support?
I haven't contacted the support team directly. Our system team does it.
How was the initial setup?
Apache Airflow provides templates for deployment, which makes it easy. When deploying the tool or using dbt, we usually use Kubernetes. We configure Kubernetes to generate a Docker file that sets up the Kubernetes servers for us. This means that when we deploy, it automatically goes to production. The whole process can be completed in seven weeks.
What's my experience with pricing, setup cost, and licensing?
I use the tool's open-source version.
What other advice do I have?
The solution's maintenance involves upgrades. Our system team handles maintenance for us. Their main tasks are upgrading versions and addressing vulnerabilities. It's hard work, but they manage it well. Maintenance takes about two weeks per year for our system team.
I rate the product a seven out of ten and I recommend it to others.
Which deployment model are you using for this solution?
On-premises
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2024-06-24T00:00:00-04:00
Rated 5 out of
5 by
Fadi Bathish from
Very stable, easy to learn, and quite configurable
What is our primary use case?
We use this solution to monitor BD tasks.
What is most valuable?
The solution is quite configurable so it is easy to code within a configuration kind of environment.
The ease of learning and using the solution is quite good. The learning curve is low so new users can learn in a short period of time in comparison to other products.
What needs improvement?
The following should be improved:
* Dashboards
* Security
* Airflow web UI
* Telemetry for logging, monitoring, and alerting purposes
* Documentation
For how long have I used the solution?
I have used the solution for six months.
What do I think about the stability of the solution?
The solution is 99% stable. We have a few glitches here and there but have been able to fix them.
What do I think about the scalability of the solution?
The solution is quite scalable. You can grow in terms of users and environment. You can grow to multi-server applications. You can use the solution on desktops, mobile, or other devices.
How are customer service and support?
We have an internal tech support team so have not needed support from the vendor.
How was the initial setup?
The setup is straightforward. The time for deployment depends on the environment and user base.
What about the implementation team?
We implement the solution in-house. We have one implementation with 60 users and another with 75 users.
We have a tech support team that consists of ten engineers who support implementations. They follow up on issues that might arise during the process automation or implementation of the workflow itself.
For example, our tech support team will resolve a workflow that gets stuck during the MDM workflow engine. The tech team has the knowledge base to resolve any of these issues.
What's my experience with pricing, setup cost, and licensing?
The solution is open source.
What other advice do I have?
I do not have exposure to use cases for large organizations with a huge user environment, so I cannot speak to the solution's effectiveness in these scenarios.
I rate the solution an eight out of ten.
Disclaimer: I am a real user, and this review is based on my own experience and opinions.Implementer
Date published: 2023-02-21T00:00:00-05:00
Rated 5 out of
5 by
reviewer2108010 from
Connects to everything we need, but doesn't support development through the UI
What is our primary use case?
We were using Apache Airflow for our orchestration needs. We used it for all the jobs that we had created in Databricks, Fivetran, or dbt. These were the three primary tools that we were using. There were a few others, but these were the three primary tools. So, Apache Airflow was for the job orchestration and connecting them to each other for building our entire data pipeline. We were also using Apache Airflow for dbt CI/CD purposes.
What is most valuable?
The most valuable feature is that it's the most popular data orchestration tool in the market right now. It connects to everything you need.
It's open-source. You have a lot of documentation and a lot of people helping out. It has large communities, so if you need something or you want to ask something, you can. Often, someone else would have already asked that question, and they would have already got the answer, and you can just look it up.
Development on Apache Airflow is really fast, and it's easy to use with the newer updates. Everything is in Python, so it's not hard to understand. They also have a graphical view, so if you are not a programmer and you are just an administrator, you can easily track everything and see if everything is working or not. For notifications, it can connect with different messaging tools such as Slack and Teams, as well as with webhooks. It's very easy to use, and it has a lot of features that you would expect from any of the data orchestration tools.
What needs improvement?
Programmatically, it's very good, and it doesn't have any competitors, but you cannot develop anything in Airflow UI. You need to develop everything within the program. In the market, other tools have come up recently as competitors to Airflow, and they also give graphical programming options, whereas Airflow doesn't provide that feature currently. All the DAGs you want to build need to be coded in Python. It doesn't provide features for graphical programming. You cannot drag and drop something, build a pipeline out of that, or orchestrate that with a drag and drop. They have a graphical feature but only for administration purposes, not for development. They don't have a UI for development.
It doesn't support the Windows system. That's a big drawback because a lot of people are using Windows.
For how long have I used the solution?
I used Apache Airflow on my previous project. We had planned to use it in our current project, but due to time issues, we were not able to deploy it. In my previous project, I used it for around eight or nine months.
What do I think about the stability of the solution?
It's a very stable product.
What do I think about the scalability of the solution?
It's highly scalable. You can scale it as much as you want. It depends on the size, and you need to scale up your instance. We had over 3,000 DAGs in our previous project, and we didn't face any issue with even 8 GB memory in our EC2 instance. If you have a lot of DAGs, you might need to scale up, but it's quite lightweight, so you don't need to worry much about that.
How are customer service and support?
It's open source. It was my first project, and I had a few doubts, but everything I needed was available on the internet, so I never had to contact their support. I might have been able to post my questions on their GitHub, but I didn't need that. Airflow has a very large community, so any questions you ask get answered there.
How was the initial setup?
Its setup wasn't done by us. It was done by the Astronomer team on Azure Community Services. So, it was deployed and set up on Azure Community Service. Everything was taken care of by the Astronomer team.
What about the implementation team?
Apache Airflow has two large and popular distributors. There might be others, but the two popular ones are Bitnami and Astronomer. For us, everything was set up by Astronomer.
What's my experience with pricing, setup cost, and licensing?
It's open source. You can install it locally on your own system. If you are deploying it in the production system, you normally deploy it on some cloud, such as EC2 service, which would have some cost. If you are setting up a Docker container or something for Apache Airflow yourself, which is quite easy, you can do pretty much everything online. I have set it up on my local system, and It doesn't take a long time. You can do customization for your project such as selecting different repository databases or selecting different cellular or web services, which is good.
If you are going with a service provider such as Astronomer or Bitnami, they will charge you because they are a distributor of Airflow. They have some of their own features and their own support. They will charge you if you are going with them.
What other advice do I have?
If you are on a Mac or Linux system, it's very easy to install. You can just go to the Apache website to install it, and you can start working, but Apache Airflow doesn't support Windows Exe installation, so if you have some knowledge of Docker containers for WSL, it'll be useful.
Other than that, Astronomer has an instructor called Marc Lamberti who is very popular in the Airflow community. He has YouTube videos. In five minutes, he can teach you how to set up Airflow or what DAGs are. He has five or six videos, and he gets into the details with his videos. So, if you have no idea about Apache Airflow and you don't want to go through all the documentation, you can start with those videos, but if you have a Mac or Linux system, you can directly install it on your system.
I'd rate it a seven out of ten because it doesn't support Windows, and it doesn't support graphical designing, so we cannot create DAGs in the UI. We can administer and look at DAGs through the UI, but we cannot create DAGs through the UI. Other orchestration tools that are available in the market provide that feature.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2023-03-04T00:00:00-05:00
Rated 5 out of
5 by
Nomena NY HOAVY from
An easy to implement and flexible solution
What is our primary use case?
Currently, I am a lead data scientist. Our primary use cases for Apache Airflow are for all orchestrations, from the basic big data lake to machine learning predictions. It is used for all the MLS processes. It is also used for some ELT, to transform, load, and export all big data from restricted, unrestricted, and all phase processes.
What is most valuable?
The user experience of Apache Airflow is good. The solution is flexible for all programming languages for all frameworks. I also value that it is used for monitoring. Apache Airflow helps to easily integrate data sources with other products.
What needs improvement?
Apache Airflow could be improved by integrating some versioning principles. Currently, we have to swap some tags in our flow. It would be interesting if we can check the product and version all of the product at the same time comparing what scripts have changed from last year to this year, or last month to this month.
For example, we have a flow for one project, to version it we need to check it one by one to identify which tags changed and which scripts changed. All of these need to be done manually.
For how long have I used the solution?
I have been using Apache Airflow for four months.
What do I think about the stability of the solution?
We have experienced some bugs in Airflow. For example, the solution did not mention all the errors regarding why a process did not work. We had to investigate to try and understand why it was not working.
What do I think about the scalability of the solution?
The solution is easy to scale. We have four people in our organization that use Airflow. One is dedicated to the solution, while the others can use it to adjust the flow of their jobs on their own.
How are customer service and support?
We do not use technical support. We are trained to resolve concerns on our own. If a problem is significant we could call support, however, there is a good developer community that uses Airflow that can help resolve the issue with us.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
Prior to using Airflow, I used Windows SSIS for three years. We made the switch because Windows SSIS uses the drag-and-drop concept, where Airflow requires coding. Also, Windows is orientated to Microsoft products and is not very flexible.
How was the initial setup?
I am a technician, so the initial setup is instinctive. Without experience, it would not be as simple. Experience with configurations with parameters is required. The documentation is good, however, it does not mention some features explicitly requiring some research.
I would rate the ease of implementation a three out of five.
What about the implementation team?
We have dedicated machine learning ops, so we manage all product deployment ourselves. The deployment takes about four days, including two days of administration.
Apache Airflow requires maintenance. It is very important to maintain all the source codes and all the data. We are looking for a platform that would facilitate the maintenance of the project.
What's my experience with pricing, setup cost, and licensing?
We use a community edition of Apache Airflow. It is open-source and free.
What other advice do I have?
Anyone considering Apache Airflow should make sure that they have a good team with experience, including some administration. A strong background will help to understand and exploit the strengths of the platform.
I would rate this solution a nine out of 10 overall.
Which deployment model are you using for this solution?
On-premises
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2022-09-01T00:00:00-04:00
Rated 5 out of
5 by
Mahendra Prajapati from
A customizable solution, but the integration process could be simplified
What is our primary use case?
Our primary use case for this solution is scheduling task rates. We capture the data from the SQL Server location and migrate it to the central data warehouse.
What is most valuable?
The best feature is the customization that can be done using Python. For example, there are use cases where we have to tweak the algorithm and with Apache Script Rate, we have extra functionality that helps to change the underlying process. We can define our algorithms and processes using Python.
What needs improvement?
The solution could be improved by simplifying the integration process and providing access to its support team to guide integration.
For how long have I used the solution?
We have been using this solution for two months and it is deployed on-premises.
What do I think about the stability of the solution?
The solution is stable but primarily depends on the support team and how they manage it.
What do I think about the scalability of the solution?
Apache Airflow is scalable. Approximately 20 people use this solution on my team.
How are customer service and support?
We haven't had any experience with customer service and support.
Which solution did I use previously and why did I switch?
Previously, we were using SQL server integration tools and integration service SSIS packages. We had project orders and wanted to migrate everything as it was an open source rate and no license was required. We switched to Apache Flow because we are trying to migrate all the projects developed in SSIS using Python.
How was the initial setup?
The initial setup was straightforward. However, if a script is written, it takes four to five minutes to set up.
What's my experience with pricing, setup cost, and licensing?
Apache Airflow is open source, so I cannot comment on licensing costs.
Which other solutions did I evaluate?
We chose this solution because it was suitable for our business needs.
What other advice do I have?
I rate this solution a seven out of ten. My advice to new users is to have good proficiency with Python language. The solution is good but can be improved by simplifying its integration process.
Which deployment model are you using for this solution?
On-premises
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2022-09-20T00:00:00-04:00
Rated 5 out of
5 by
VenugopalKathirvel from
Flexible open-source solution
What is most valuable?
Apache Airflow's best feature is its flexibility.
What needs improvement?
Apache Airflow could be improved with the addition of more frameworks.
For how long have I used the solution?
I've been using Apache Airflow for four years.
What do I think about the stability of the solution?
Apache Airflow is stable.
What do I think about the scalability of the solution?
Apache Airflow is scalable.
How was the initial setup?
The initial setup was very easy.
What about the implementation team?
We used an in-house team.
What's my experience with pricing, setup cost, and licensing?
Apache Airflow is open-source and free of charge.
What other advice do I have?
I would rate Apache Airflow eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2022-09-20T00:00:00-04:00
Rated 5 out of
5 by
Joaquin Marques from
A useful solution to set up workflows and processes
What is our primary use case?
Our primary use case for the solution is setting up workflows and processes applied everywhere because most industries are based on workflows and processes. We've deployed it for all kinds of workflows within the organization.
What is most valuable?
The ability to easily set up and deploy workflows with Airflows is valuable. Additionally, designing processes and workflows is easier, and it assists in coordinating all of the different processes.
What needs improvement?
The solution can be improved by creating a tool that allows us to do these kinds of things graphically instead of just writing scripts. Hence, the graphical user interface can be improved.
For how long have I used the solution?
We have been using the solution for approximately one year and are currently using the latest version.
What do I think about the stability of the solution?
The solution is stable.
What do I think about the scalability of the solution?
The solution is scalable. Approximately hundreds of thousands of people are utilizing it.
How are customer service and support?
We have not had any issues that require customer service and support.
How was the initial setup?
The initial setup is intermediate, and two people are required for deployment.
What was our ROI?
There is a return on investment because it's free, open source and very useful, so there is a significant return on investment.
What other advice do I have?
I rate the solution an eight out of ten.
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2022-12-16T00:00:00-05:00
Rated 5 out of
5 by
Anandhavelu Arumugam from
Useful for scheduling purposes but should include no-code capabilities
What is our primary use case?
I use this solution for scheduling purposes. We have our own Python framework to run jobs, do the extractions, and for transformation loading.
We have 20 people who are using Airflow. It's being used on a daily basis. We don't have any plans to increase usage because we have low data sets.
The solution is deployed on cloud. The cloud provider is Azure.
What needs improvement?
Everything is in the Python framework now. I would like to see some no-code capabilities and drag and drop abilities in Airflow.
We're expecting a few more improvements in the log generator. Currently, it's very clumsy.
For how long have I used the solution?
I have used Apache Airflow for three years.
What do I think about the stability of the solution?
It's stable.
What do I think about the scalability of the solution?
It's scalable. So far, we haven't needed more scalability because it's totally controlled by administrators.
Which solution did I use previously and why did I switch?
The only difference between Apache Airflow and BPM software is the pricing.
How was the initial setup?
Setup is about medium difficulty. You need to have some prior knowledge and experience with docker containers and AKS.
What's my experience with pricing, setup cost, and licensing?
It's open-source.
What other advice do I have?
I would rate this solution as seven out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2022-12-27T00:00:00-05:00
Rated 5 out of
5 by
Luiz Cesar Gosi from
A useful tool for data orchestration and collecting information
What is our primary use case?
We use Apache Airflow for data orchestration.
What is most valuable?
Apache Airflow is a pretty useful tool for collecting information. Apache Airflow is a pretty easy solution that can be used with Python. The solution's UI allows me to collect all the information and see the code lines.
What needs improvement?
I have some issues with the solution's communication. The solution uses the same database or data set. Sometimes, we consume the same data and send it to a different place when doing a different DAG. When using the UI, I want to see that we use the same data set more than once.
For how long have I used the solution?
I have been using Apache Airflow for five years.
What do I think about the stability of the solution?
I rate Apache Airflow a seven out of ten for stability.
What do I think about the scalability of the solution?
I rate Apache Airflow an eight out of ten for scalability. Around 400 users are using the solution in our organization.
Which solution did I use previously and why did I switch?
I previously used Control-M and some AWS and Google Cloud Platform tools.
How was the initial setup?
Apache Airflow's initial setup is pretty straightforward. Apache Airflow is quite intuitive to set up and create DAGs.
What about the implementation team?
It takes around two days to deploy Apache Airflow. A DAG can be created in just a few hours.
What other advice do I have?
Apache Airflow is deployed on-cloud in our organization.
Overall, I rate Apache Airflow a nine out of ten.
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2023-10-24T00:00:00-04:00
Rated 5 out of
5 by
SabinaZeynalova from
Can be used with multiple systems and servers, Kubernetes systems, and dashboard systems
What is our primary use case?
We use Apache Airflow for the automation and orchestration of model deployment, training, and feature engineering steps. It is a model lifecycle management tool.
How has it helped my organization?
We have an integration with Apache Airflow in our portal for messaging. We use group and transformation data from Redshift to Tesco, and then create a call flow to the router. This is a source of data leakage, such as data engineering and machine learning, especially in a HIPAA environment. We need to check the evolution steps in the pipeline. In production, we only have two cases. Sometimes, we need customer data not in the database, which we get from object storage. The call flow from Redshift to Tesco involves transforming the data and then generating it with the router or Kibana router for the policy. The data is then transformed and sent to the dashboard or data warehouse.
What needs improvement?
Airflow is a pipeline for transferring code by clients, but for experimental model experiments, Apache Airflow does not have any solution. There is a need for more features on experimental evolution steps.
For how long have I used the solution?
I have been using Apache Airflow for one and a half years.
What do I think about the stability of the solution?
The product is stable. I rate the solution’s stability an eight out of ten.
What do I think about the scalability of the solution?
20 users are using this solution in our organization. I rate the solution’s scalability an eight out of ten.
How was the initial setup?
The initial setup is not complex and can be done by two people. However, open-source prime solutions have some difficulties. We can schedule Apache Airflow on Kubernetes. Space limitations and installation issues may arise, as we do not have full control over Kubernetes cluster resources, and our administration is limited. I rate the initial setup a six out of ten, where one is difficult, and ten is easy.
What other advice do I have?
I recommend Apache Airflow because it is still profitable and can be used with multiple systems and servers, Kubernetes systems, and dashboard systems. You can use it to get social media and other data, but it can be expensive. Overall, I rate the solution a nine out of ten.
Which deployment model are you using for this solution?
On-premises
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2023-09-28T00:00:00-04:00
Rated 5 out of
5 by
Damian Bukowski from
A license-free tool that is not just easy to learn but also easy to use
What is our primary use case?
In my company, we use Apache Airflow as an orchestrator because we have a lot of business use cases that involve the automation of people's jobs. For example, if someone takes a file and then moves the file from one folder to another, and we have a lot of scripts to do this in PL/SQL or bash pipelines, we decide to move all of this to be orchestrated through one hub application. Instead of having a few things on the database from Oracle while a few things run on local machines, in our company, we wanted this all to be orchestrated through one thing, which is why we chose Apache Airflow.
What is most valuable?
I like that Apache Airflow is in Python language, making it easy to use and learn. I like Apache Airflow's versatility. Essentially, if you want to do something, there is generally a webhook that you can use with Apache AirFlow, especially if you use solutions from big companies like Google or Microsoft. Many providers are not from Apache since, with Apache Airflow, it is very easy to develop and integrate applications from various developers.
What needs improvement?
The only thing I would like Apache to do is to introduce an integration of the database from Oracle because it currently supports Postgres primarily in MySQL. Oracle is something that many companies use, like a production database, for which you have to pay since it is not free and offers more extended support. With Apache Airflow, even though it uses Python and Python has modules that include Oracle databases, it'll be safer and more convenient to do it through Apache Airflow and not through Python scripts. I want to see Apache Airflow have more integrations with more production-based databases since it is an area where the product lacks currently.
For how long have I used the solution?
I have been using Apache Airflow for a year and a half. Our company has a production environment for Apache Airflow since we are familiarizing ourselves with the product currently. I use Apache Airflow Version 2.6.1, which is the most stable one. Regarding Apache Airflow, I don't know if any recent updates were released. I am an end-user of Apache Airflow.
What do I think about the stability of the solution?
Considering that in my company, we have Apache Airflow deployed on-premises, I rate the stability of the product a ten out of ten since I haven't seen any issues. If Apache Airflow had been deployed on the cloud, then it wouldn't have been very stable.
What do I think about the scalability of the solution?
I rate Apache Airflow's scalability an eight out of ten because you can scale it however you want since it is in Python. There are some limitations in Apache Airflow. Airflow does not process or hold any data, so if you have many scripts running on this tool, then even small variables stored in the database will eventually overflow the database. By design, Airflow is scalable up to a certain point, but I don't imagine anyone will reach that point. The product's scalability has some limitations, so I cannot give it a ten out of ten, though I think it is pretty much a perfect tool.
I use Apache Airflow daily in my company.
How are customer service and support?
My company had directly contacted the technical support team of Apache, but I used Apache's GitHub Pages, along with its documentation, which was very thorough and helpful. Considering the documentation and stuff Apache provides online as support, I would give it ten out of ten, even though I have not personally spoken to Apache's support team.
How would you rate customer service and support?
Positive
How was the initial setup?
The initial setup is simple. Using pip, you type in the version of the Airflow and download it, which is very convenient.
The solution is deployed on an on-premises model.
The solution can be deployed in a couple of minutes.
Almost ten RPA DevOps engineers in my organization use Apache Airflow.
What's my experience with pricing, setup cost, and licensing?
As far as I know, Apache Airflow is a product that is free of licenses, meaning there is no need to buy a license.
What other advice do I have?
Apache Airflow recently introduced a new way of writing scripts, and I quite like it. It's very convenient to write it in taskflow instead of using it with clauses that Python has, so Apache is improving the technology of Airflow as Python improves.
Apache Airflow does require maintenance.
For Apache Airflow, two engineers are involved in the maintenance phase. As we get more servers in our company, we expect the number of people involved in the maintenance phase to increase from two to five.
I recommend the solution to those requiring an orchestrator to manage all of their different scripts. Considering that Apache Airflow is in Python, you don't need to rewrite anything since all you need to do is write a short script in Python that will execute the scripts you already have in bash or PSQL or whatever you want. If someone needs an orchestrator, Apache Airflow is a perfect product.
I rate the overall product a nine out of ten.
Which deployment model are you using for this solution?
On-premises
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2023-10-09T00:00:00-04:00
Rated 5 out of
5 by
Fr Br from
Well-documented with plenty of resources available online but improvement needed in automation capabilities
What is our primary use case?
Our use cases are a bit complex, but primarily for data extraction, transformation, and loading (ETL) tasks.
What is most valuable?
It's well-documented and has plenty of resources online, making it easy to get started. On the other side, there aren’t many possibilities to draw processes that have more options for visually designing workflows and automatically generating processes, as some dedicated ETL tools offer.
What needs improvement?
The automation capabilities could be improved; a visual workflow designer and a graphical tool to reduce coding would be very helpful. But for now, it's sufficient for our simple workflows.
For how long have I used the solution?
We've only been using it for very basic processes, for about six months now.
What do I think about the stability of the solution?
There were no problems. It is a pretty stable solution. I would rate the stability a nine out of ten.
What do I think about the scalability of the solution?
We're not using it anywhere near its limits, so scalability hasn't been an issue. We're running it on a virtual server, which we can easily upgrade if needed.
We currently have three end users. We plan to scale it up to 60 users.
How are customer service and support?
I only relied on community support.
Which solution did I use previously and why did I switch?
We usually chose Airflow without considering other solutions because it's standard worldwide and well-established and open source. For our purposes, it was sufficient.
How was the initial setup?
The initial setup was easy. It took one day to configure and deploy the product.
For now, we only have one person for maintenance, who knows about Airflow manages the updates, and keeps it running smoothly.
What's my experience with pricing, setup cost, and licensing?
For the time being, it doesn't cost anything.
Which other solutions did I evaluate?
I was considering Bonita because I saw it in a demo, but that's all.
What other advice do I have?
It's well-suited for simple, certain programs. Overall, I would rate the solution a seven out of ten.
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2024-01-15T00:00:00-05:00
Rated 5 out of
5 by
Sanket Suhagiya from
Efficient pipeline building with intuitive UI and powerful Python features
What is our primary use case?
The primary use case for us is ETL pipelines. We write some pipelines to ingest the data. That is the primary use. And, secondly, we use it to run some scheduling and orchestration. We need to run some automation jobs every day. So, we just write an Airflow task and pipeline that runs every day or every hour or however we need. Those are the two things we use it for.
How has it helped my organization?
Since integrating Airflow, we are efficiently able to build pipelines around it in days. If there is a requirement within days or at the end of the week, we can create a pipeline for it.
What is most valuable?
The declarative language in Python is very powerful as the learning curve is really less. The UI is also very intuitive, and it makes sense. The core features are strong, which are supported by Apache Airflow variables, DAGs, and connections. Connections make it really extendable to plug-ins and custom modules we can write around it.
What needs improvement?
The UI is a little bit outdated according to modern standards. The UI can be enhanced to support some modern standards. Maybe small things such as dark mode and some proper aesthetics can be implemented.
For how long have I used the solution?
I have been working with Airflow for the past two years.
What do I think about the stability of the solution?
We have not faced any performance issues. Our team follows a custom deployment and uses a Kubernetes runner in the backend, so we can scale it as we need. Scalability-wise, we have not faced any issues.
What do I think about the scalability of the solution?
We use Kubernetes on the backend, which allows us to scale it as needed. We have not faced any issues with scalability.
How are customer service and support?
The team prepared comprehensive user guides and FAQs. I do not remember raising any tickets or concerns with tech support.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
I have heard people using Hadoop and some Informatica flows. Informatica flows were pretty rigid, and custom solutioning was more difficult with those.
How was the initial setup?
We used the help of the Astronomer company for setting up Airflow. Setting up the pipelines is straightforward. We have a CI/CD system, and we just write a Python script, and the pipeline is up and running in minutes.
What about the implementation team?
We used the help of the Astronomer company for setting up Airflow.
What was our ROI?
I might not have the numbers for the investment. Whatever the investment, we can efficiently build pipelines around it in days. If there is a requirement within days or at the end of the week, we can create a pipeline for it. So, the ROI should be good.
What other advice do I have?
If it is a large deployment, it is good to go with a managed approach where someone else would be managing for us. If it is small, we can go on our own by spinning up some Kubernetes clusters and deploying it in the cloud.
I'd rate the solution nine out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Other
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2024-10-02T00:00:00-04:00
Rated 5 out of
5 by
Miodrag Milojevic from
Streamlines complex data workflows with its user-friendly interface, robust scheduling, and monitoring capabilities, offering scalability and efficient orchestration of diverse sources
What is our primary use case?
It serves as a versatile tool for data ingestion, enabling various tasks including data transformation from one type or format to another. It facilitates seamless preparation and processing of data, supporting diverse operations such as format conversion, type transformation, and other related functions.
How has it helped my organization?
We leverage Apache Airflow to orchestrate our data pipelines, primarily due to the multitude of data sources we manage. These sources vary in nature, with some delivering streaming data, while others follow different protocols such as FTP or utilize landing areas. We utilize Airflow for orchestrating tasks such as data ingestion, transformation, and preparation, ensuring that the data is formatted appropriately for further processing. Typically, this involves tasks like normalization, enrichment, and structuring the data for consumption by tools like Spark or other similar platforms in our ecosystem.
The scheduling and monitoring functionalities enhance our data processing workflows. While the interface could be more user-friendly, proficiency in scheduling and monitoring can be attained through practice and skill development.
The scalability of Apache Airflow effectively accommodates our increasing data processing demands without issue. While occasional server problems may arise, particularly in this aspect, overall, the product remains reliably stable.
It offers a straightforward means to orchestrate numerous data sources efficiently, thanks to its user-friendly interface. Learning to use it is relatively quick and straightforward, although some experimentation, practice, and training may be required to master certain aspects.
What is most valuable?
Our data workflow management is greatly streamlined by the use of Apache Airflow, which proves highly beneficial. Its user-friendly interface makes it straightforward to operate, offering a plethora of features for data preparation, buffering, and format conversion. With its extensive capabilities, Airflow serves as a comprehensive tool for managing our data workflows effectively.
What needs improvement?
The current pricing of Apache Airflow is considerably higher than anticipated, catching us off guard as it has evolved from its initial pricing structure. It would be beneficial to improve the pricing structure. Also, enhancing the interface furthermore would be highly beneficial.
For how long have I used the solution?
We have been using it for approximately two years.
What do I think about the stability of the solution?
While the stability of the system is satisfactory, maintaining stability requires vigilance and attention to various factors. During usage, occasional issues may arise, particularly when operating on-premises configurations. For instance, a single hard disk failure on a physical node can pose a challenge, necessitating the node's shutdown for disk replacement. However, the process of switching off and on the node is intricate and requires careful handling.
What do I think about the scalability of the solution?
Scalability is achievable, but it comes with its challenges, particularly in terms of temporary downsizing due to failures or other unforeseen circumstances. While scaling up is feasible, each additional node introduced into the cluster adds complexity and raises the likelihood of potential failures. Dealing with failures involves following standard procedures, yet reinstating the cluster to its fully operational state can be a demanding task.
Approximately ten technical staff members and an equivalent number of data scientists utilize the platform. Additionally, a segment of the network team employs it for network quality analysis, leveraging reporting tools built on top of Impala, which is integrated into the cluster.
How was the initial setup?
The setup process is notably intricate, particularly considering our cluster configuration consisting of twelve data nodes and various additional components. Furthermore, unforeseen issues may arise, such as disk space constraints for Airflow or similar challenges, necessitating vigilance and attention to detail to avoid complications.
What about the implementation team?
The initial phase of the deployment process involves creating a comprehensive plan outlining the setup of our cluster, considering all nodes involved. Since we're deploying on-premises, we need to determine which components will reside on physical machines and which can be accommodated on virtual machines or clusters. This assessment will guide the allocation of resources to each server, ensuring an optimal configuration. Following this, the configuration phase begins, taking into account the specific requirements of our organization and stringent security measures. Access to the clusters must be carefully managed, categorized, and restricted as per security protocols. It's imperative to prepare everything meticulously prior to deployment to ensure a smooth and successful implementation. We've undertaken the deployment process partially in-house and with the assistance of the system integrator dedicated to this project. For maintenance and deployment tasks, we rely on a team of ten technical personnel. Typically, only two or three individuals are needed to monitor operations and address issues as they arise. Moreover, we have the backing of our system integrator for additional support if necessary.
What's my experience with pricing, setup cost, and licensing?
The pricing is on the higher side.
What other advice do I have?
I would confidently recommend Apache Airflow to others, assuring them of its benefits. In my opinion, it's a mature and efficient product that delivers reliable performance. Overall, I would rate it nine out of ten.
Which deployment model are you using for this solution?
On-premises
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2024-03-25T00:00:00-04:00
Rated 5 out of
5 by
Alvaro De Lucas from
Equips users with a comprehensive feature set for managing complex workflows and has a responsive technical support team
What is our primary use case?
We use the product for scheduling and defining workflows. It helps us extensively to manage complex workflows within Cloudera's ecosystem, particularly for handling and processing data.
How has it helped my organization?
The solution has been beneficial in automating and managing our data workflows efficiently. It has integrated well with our Cloudera environment, enabling us to handle complex workflows with greater ease and reliability.
What is most valuable?
The solution's most valuable feature is its ability to run workflows without saving changes. It allows us to execute tasks without permanently altering our configurations, which is useful for temporary adjustments and testing.
What needs improvement?
One area for improvement would be to address specific functionalities removed in recent updates that were previously useful for our operations.
Additional features that could enhance the product include more flexibility in parameterization and improved tools for managing and debugging workflows.
For how long have I used the solution?
I have been working with Airflow for approximately a year and a half, focusing on the current version for the past eight months.
What do I think about the stability of the solution?
The product has been stable in our environment.
What do I think about the scalability of the solution?
The product is scalable.
How are customer service and support?
The technical support team has been responsive and helpful. They addressed issues related to removed functionalities and ensured critical features were restored in subsequent updates.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We previously used Hortonworks but switched to Cloudera CDP. We also used other Cloudera tools but found Airflow to be a better fit for our current needs due to its capabilities in workflow management.
How was the initial setup?
The initial setup was complex due to the integration with various data sources and configuration requirements, but once properly set up, it has proven effective.
What about the implementation team?
The implementation was carried out with guidance from Cloudera's support team, who provided valuable assistance in configuring the solution to meet our requirements.
Which other solutions did I evaluate?
We evaluated other data workflow solutions but found Airflow the most suitable due to its integration with Cloudera and comprehensive feature set for managing complex workflows.
What other advice do I have?
Airflow integrates well with Cloudera and effectively supports complex operations. However, users should be aware of changes in functionality between versions and plan accordingly.
Overall, I rate it a nine out of ten.
Disclaimer: My company has a business relationship with this vendor other than being a customer:Partner
Date published: 2024-09-19T00:00:00-04:00
Rated 5 out of
5 by
Prathamesh D Marathe from
Easy to use and implement its functionalities
What is most valuable?
Apache Airflow is an open-source tool. Apache Airflow can be integrated or used to run multiple files. The tool is helpful in the orchestration process, and it also allows our company to provide some notifications to teams via email. The main valuable feature of the tool is that it is an open-source product and that it can be integrated with any cloud environment.
What needs improvement?
I have not come across any challenges associated with the product.
The scripts that we use in our company refer to the package dependencies in Python, but those are lost when Apache Airflow starts running for a particular test.
The in-built package dependencies in Python have some issues in Apache Airflow, making it an area that needs improvement.
For how long have I used the solution?
I use the solution in my company for orchestration since we have data coming from different sources. Once the data arrives at our company, we consume the data, and then we do the transformation process in SQL. There can be a few Python scripts or any SQL scripts, and our company does the data quality check using Great Expectations. After that, it gets loaded into the target database.
What do I think about the stability of the solution?
It is a stable solution. In Apache Airflow, the in-built package dependencies in Python have some issues. Apart from the aforementioned area, Apache Airflow provides a stable environment.
What do I think about the scalability of the solution?
Around 70 to 80 people in my company use the product, as it is used across all the projects.
How are customer service and support?
I have never contacted the solution's technical support team.
How was the initial setup?
I have not exactly worked on the installation part, but I have definitely worked on the tool's local installation phase, which was an easy process.
The solution is deployed with the help of the cloud services offered by AWS.
What's my experience with pricing, setup cost, and licensing?
It is an open-source tool. There are no additional fees or charges associated with the product. Expenses are associated with only the machines that our company uses on AWS.
What other advice do I have?
DAG or a directed acyclic graph functionality has enhanced our company's workflow management since it helps to find out the source tasks that are running and also the target that fall subsequent to them, and it helps figure out how the data flow is working. In DAG, our company can group the tasks, and so that helps to figure out which group of tasks are running.
Speaking about my experience with Apache Airflow's UI for monitoring and managing workflows, I would say that the tool's UI allows one to add variables, and it also allows one to check the status of the tasks that are running or the previous run on a particular DAG. Our company can send notifications via Apache Airflow, and also check the connection and configuration details.
I recommend the tool to others who plan to use it since it is quite easy to use and it is easy to implement its functionalities for the use cases.
I rate the tool a nine out of ten.
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclaimer: I am a real user, and this review is based on my own experience and opinions.
Date published: 2024-06-10T00:00:00-04:00